Librarians, as educators and innovators, are uniquely placed to foster access points and tools to meet users wherever their research skills and digital competencies are, as well as access digital spaces successfully.
Open to discovery: Refining metadata standards & creating a controlled vocabulary for Open Shelf
By Christine Moffatt, Madelaina DePace, and Tiffany Ribeiro
300, 400, 500 . . . could there be any more tags for Open Shelf articles? After six years of publishing stories, editorials, and podcasts online for the Ontario Library Association, in 2020, the Open Shelf editorial team realized that they had at least as many tags for stories as they had stories! And, although there was a tagging protocol in place, the process was out of control (or at least I was, as I invented tags on the fly) and the usefulness of tags as a discovery tool was questionable.
It was time for a cleanup, a reorg, a return to metadata basics!
So a call went out to Open Shelf readers for a “tagging wrangler.” Fortunately, instead of just one volunteer to help revise and revitalize the tagging process, three volunteers stepped forward to help: Christine Moffatt (who became our metadata editor), Madelaina DePace, and Tiffany Ribeiro.
As you will read, this tireless three-person working group has done extensive work to ensure that the tagging process now works for readers, contributors, and editors alike. The success of their efforts speaks to this repurposing of an old expression: for every minute of choosing a proper tag, an hour of delightful reading is earned. In this article, Christine, Madelaina, and Tiffany explain how the project unfolded, the challenges they faced, and the metadata lessons that they learned.
Martha Attridge Bufton
Editor-in-Chief (out-going), Open Shelf
The need for metadata cleanup
Open Shelf is the official magazine of the Ontario Library Association (OLA). It publishes articles on the WordPress platform that keep people informed on trends or issues that impact libraries and related organizations. To ensure that these articles are searchable by theme or by topic, tags are assigned. Tags are words or phrases that describe the content in a piece of information; t is a kind of metadata that allows the articles to be found through searching or browsing.
To improve the discoverability of and access to content, a controlled vocabulary was needed. Or, at least, the existing vocabulary of 760 tags, needed to be overhauled because such a large vocabulary list could make the article tagging process confusing and overwhelming for both contributors and editors.
In June 2020, our team of three library professionals came together to create a metadata editorial team tasked with reviewing, modifying, and improving this large list of tags. As we will explain, we were responsible for defining the scope of Open Shelf’s first metadata clean-up project, choosing the tools we used on WordPress to do our work, creating our controlled vocabulary, and making sense of what we learned overall from this project.
Defining the project scope
Defining the project scope is an important first step in planning and often determines whether a project will succeed. This stage involves determining and documenting a list of specific goals, deliverables, and tasks for the project, and then establishing the responsibilities for each team member.
We started defining the project scope by:
- analyzing a spreadsheet of all the tags used on Open Shelf
- reviewing the publication’s current guidelines on tagging articles, and
- conducting background research on tagging best practices on WordPress.
Our analysis revealed two main challenges that we would face:
- Tag quantity: As of June 2020, Open Shelf had an index of over 760 tags. Almost 500 of those were “unique tags”, or tags that were only used to describe content in one article. Every time a new tag was created for a post, it also created a new tag archive page on the Open Shelf website. Imagine you have a filing cabinet that contains separate folders for the words volunteer, volunteers, and volunteering—you now have three separate folders with a single sheet of paper in each one despite the fact that each paper contains information on volunteers. It becomes harder to find information on a topic when there are too many folders, and your filing cabinet will become complicated to navigate. The same problem can happen with a website that has too many tag archive pages.
- Tagging responsibility: Open Shelf has a detailed reference guide for editors and authors on how to create tags for the website. In practice, authors were often creating and applying tags to their own posts and were not aware of pre-existing tags on the website. Author-generated keywords are commonly used in research databases to give users additional language to help them find relevant articles; however, these keywords are often used alongside a controlled vocabulary or a standardized list of words or phrases to describe the content of a publication. On a website like Open Shelf, using a controlled vocabulary can affect how search engines like Google will rank your content, and it can streamline your website by eliminating excess tag archive pages.
With those challenges to overcome, we knew we would need to reduce the overall number of tags by removing duplicate entries and condensing multiple keywords under fewer controlled vocabulary terms.
At this point, we organized our project into multiple stages that would allow us to scaffold our team training from simpler tasks (e.g. deleting tags from posts) to more complicated editing (e.g., merging similar tags together and then redirecting their old tag archive pages to the archive page of the new controlled vocabulary term).
Selecting WordPress plugins
We also had another key task: to select WordPress plugins that would help us with our work. WordPress plugins are PHP (hypertext pre-processor) scripts that extend the functionality of a website. These plugins can enhance features on a website or add new features entirely. We needed tools that would help us streamline our work and we needed them to be free, open source solutions. Choosing appropriate (new) plugins was the answer to this problem.
Our project plugins included:
- WP All Export: This free plugin allowed us to export a complete list of Open Shelf tags into a single CSV file, which we uploaded to Google Drive to work on collaboratively. In a later stage of the project, we were also able to export a CSV file that contained detailed information about all posts on Open Shelf. Unfortunately, a recent update to this plugin now requires us to buy an upgraded version in order to isolate and export lists of tags.
- Redirection: This free plugin allowed us to delete tags while also sending users to the preferred tag archive page. If a user tries to access a deleted webpage, the user will receive a 404 error message—this message lets the user know that, while the website’s server can be reached, the webpage itself is no longer available. If we deleted tags from Open Shelf’s website, we would be left with too many 404 errors, which would hurt its ranking on search engines.
The Redirection plugin allowed us to add 301 Moved Permanently response codes that would automatically redirect users from the deleted webpage to the webpage for our new controlled vocabulary term. It sounds complicated, but it made our work infinitely easier.
Creating a controlled vocabulary
Step one: We created a controlled vocabulary for Open Shelf tags was to review the 760 existing tags. To do this review, we had to decide which tags to keep, which to merge with other tags, and which to delete. The original Open Shelf tagging guide (created by Nikolina LIkarevic) provided some us with some guidance regarding what the tags should look like:
- nouns over adjectives (i.e. digitization rather than digitizing);
- no contractions;
- no hyphens;
- and privileging people over places and things.
For the most part, we embraced these practices. The significant exception to this is our inclusion of places alongside people (e.g. we have a tag for both academic librarians and academic libraries). Though this review was a slow process, we found it crucial to look at each tag one at a time, so that we could make decisions and create a workable controlled vocabulary.
Some of these decisions were easy:
- We deleted tags with typos and tags that only had one post attached to them (or none at all).
- We merged multiple tags under one term. We maintained the practice of tags taking the plural form, so multiple tags that meant the same thing (teacher-librarian and teacher-librarians) were combined under the plural form.
Other decisions, on the other hand, were much more difficult. For example, tags stemming from identities that are different from our own lived experiences. As a result, we created the new tag BIPOC to bring together posts that explored personal experiences of librarians who are Black, Indigenous, and people of colour. However, we worried that individual voices and identities might be lost under this new term. So, we also maintained a separate tag unique to Indigenous peoples (Indigenous), but also kept tags like First Nations to respect differences in identity. Of course, we are continuously open for feedback concerning these tags, and welcome further guidance concerning them.
Step two: We took a second pass over all the tags to ensure that we were all making consistent decisions regarding which tags were remaining, which were being deleted, etc. We discussed any discrepancies we came across and made decisions accordingly. An example of this concerned tags pertaining to different types of culture (i.e. fan culture, remix culture, etc.). We decided that the general tag culture was unnecessary, but that many of the more specific tags referencing culture could be merged with the culture tag. As a result, we created a new tag called arts and culture that would be assigned to articles concerning cultural topics and/or visual arts.. However, we also decided to retain the fan culture tag, as many posts referred specifically to this topic and may be a significant topic for future posts.
We also discovered that there were tags that doubled as categories. For example, readers advisory exists as a category (i.e. a regularly occurring column within Open Shelf) and as a tag. We thought this was redundant and chose to delete the readers advisory tag and use other more specific tags for each post within the readers advisory category.
Step three: we went through every article (post) manually on Open Shelf to confirm that appropriate tags had been assigned. This was a necessary step because some posts had not been tagged, and some posts—tthrough our work deleting and merging tags—twere left tagless. We addressed minor issues that remained (i.e. reconciliation tag was merged with the broader truth and reconciliation tag), were able to ensure coherence with the index of tags we had created. From the 760 tags that we began with in June, through this long but necessary process, we whittled the tag index down to just over 200 entries.
Reflecting on lessons learned
Our project took us almost six months to complete and Open Shelf editors and contributors now have access to a smaller controlled vocabulary that can be reviewed and renegotiated on an annual basis.
A few lessons we learned from this process include:
- Guidelines are needed for the creation of tags to avoid redundancy and duplication.
- One person on the editorial team should be responsible for tagging articles (hence the creation of the position of Metadata Editor).
- Good communication amongst members of the metadata team has been key to ensuring the success of the project, ie., a team needs to discuss and agree upon processes and details to achieve standardization. We were able to avoid major conflicts over the nature of our controlled vocabulary by holding regular Zoom meetings, and taking copious notes during the weeding/editing process.
- Good communication with key stakeholders was equally important. We reported regularly to the Open Shelf editor-in-chief and the rest of the editorial team. And, moving forward, the Metadata Editor will coordinate outreach to OLA divisions and committees for feedback. For example, the controlled vocabulary should be shared with the Open Shelf French editor as well as members of the Culture Inclusion and Diversity Task Force and the OLA Advocacy and Research Officer for feedback on tags for BIPOC and Indigenous groups, and French-language tags.
- Metadata projects can take time so it is critical to never underestimate the scope of a project. Our advice to other metadata groups is to “take your time to do it and don’t rush.” We have found that reviewing, revising, and creating metadata is meticulous work that requires care and attention. One of the greatest benefits of being meticulous, however, is that a team is able to familiarize itself with the material on which it is working, down to the smallest of details.
Planning our next steps
In 2021, Christine will continue on as the Metadata Editor for Open Shelf and will oversee new metadata and data clean-up projects on the back-end of the website.
A snapshot of those tasks include:
- Reviewing categories on Open Shelf
- Categories are higher up the taxonomy chain than tags are, but they also help users find related content. Our team analyzed how categories are being used and conducted background research on best practices. We made our decisions on how to organize categories and sub-categories, and we’re set to make those changes
- Advertising tag collections on social media
- Our social media team will be highlighting posts that share similar tags so readers can explore our content!
- Fixing broken links in current posts
- Open Shelf authors link to amazing resources through their posts, and sometimes those resources edit or change their links and readers can no longer access that content. This project will require a review of the more than 170 broken links recorded on the website.
- Adding new tags and updating the controlled vocabulary
- We encourage Open Shelf readers and OLA members to provide feedback and guidance on tags concerning ethnic and cultural groups.
As library professionals, we are trained to understand the critical role that metadata plays in the discoverability of information in a wide range of contexts. For an online publication such as Open Shelf, the most visible metadata are the tags that are assigned to each story, editorial, and podcast. With a shorter list of tags, the Open Shelf editorial team now has a controlled vocabulary that will enable improved discoverability of the magazine content—and the interesting and valuable contributions that members of our community make to libraries in Ontario (and beyond).
Madelaina DePace is a cataloguer for the Thames Valley District School Board in London, and previously worked cataloguing derived records at the University of Toronto’s Robarts Library as a graduate student. She graduated from the University of Toronto’s MI program in June 2020.
Tiffany Ribeiro has been a Circulation and News Technician at the Ontario Legislative Library for three-and-a-half years after graduating from Seneca College’s Library and Information Technician program. She also volunteers on the Board of Directors for the Ontario Association of Library Technicians (OALT/ABO) as their archivist.
Christine Moffatt is the Engineering and Entrepreneurship Liaison Librarian at the David Library at the University of Waterloo. She is also a Campus Coach with Concept by Velocity, where she teaches students how to find entrepreneurial and interdisciplinary research. She is also the Metadata Editor with Open Shelf, and is building on more than two years of experience in cataloguing and web-based metadata. She graduated from Western University’s MLIS program in April 2020.