This article is one in a series on designing the search experience.
This article is one in a series on designing the search experience.
The Content Creators are the people who publish the content that ends up in the search index. If the content is messy, search will be frustrating. Content must be rich in structure and meaning for search to be effective. The old adage in computers, "Garbage-in, Garbage-out", applies to search results as well. Creators, therefore, play an important part in realising an effective search experience.
It is no surprise that content creators in the enterprise focus on the production of content, not on the consumption of it. They rarely consider who might want their content when, where and why. You can't blame them. This is all they know. It is the corporate habit.
But habits are formed around existing ways of working. What if we changed how things are done? Will habits change? Chances are, they will. Just think about how Uber has changed the way urbanites get around their city. Before Uber people planned way in advance, diligently noting down bus and train timings. Uber killed the planning habit and introduced spontaneity—get a ride on demand.
So, what change will drive creators to publish search-friendly content? Technology, of course. But before we get into the details, let’s understand the situation we are up against.
Structure and meaning is a continuum (see diagram below). On the left, content is a blob of text, while on the right, content is well-structured and semantically-rich. This includes proper titles and sections, and correct metadata and taxonomy values. The strategy, therefore, is to shift-right. The more mindful and skilled creators are in facilitating a shift-right strategy, the better the search experience will be.
The creators in a typical work setting are not professional writers or journalists, adept at using language, structure and classification to craft documents. And expecting them to learn how to do such seemingly irrelevant tasks is not scalable or sustainable. We must, therefore, use technology to rescue the situation.
We are already seeing such a shift in the design world. An approach called Algorithm-driven design is slowly taking over many design tasks, from selecting the right typography to automatically designing homepage layouts. The idea is to use machine learning and good practices to help design novices, who don't necessarily have the chops, to improve their designs. Such a method can also offer design experts alternatives and ideas to experiment with. The endgame is better overall quality.
We can use similar algorithm-driven approaches along with known good practices to help creators publish search-friendly content. Given below are ideas to improve titles, sections and metadata.
The titles of documents (including web pages) have a big impact on findability and use in the enterprise, just as they do on the Internet. Users will ignore documents with vague and confusing titles (it may be more effective to just remove all such documents from the search index).
Machine learning technologies can come to the rescue by offering better ways to write titles. By learning from a database of exemplar titles, the algorithms can provide suggestions or edits to the creator (see diagram below). This way we can get more people to write effective titles, thereby improving the search experience.
Sections (both heading and copy) are what make up a document. Section headings can be shown in search results to improve relevancy and help the user jump to the section they want, instead of having to scroll through the content (see screenshot below).
Some document types have common sections. For example, a standard minutes of meeting document has sections like attendees, decisions taken, follow-ups etc. The creation of such document types can be 'templatised'. This means that when someone wants to create a minutes of meeting document, they start out by selecting a 'minutes of meeting template'. The template can also automatically pull relevant metadata such as project name, meeting dates, content type, and some predictable taxonomy tags associated with the template. This templates approach helps writers create search-friendly documents.
Metadata are statements about a page or document that can be beneficial in search. These include date published, department and content type. The good news is much of the metadata can be automatically assigned by an enterprise content management system. But there's a special type of metadata that is difficult to assign automatically. We're talking about taxonomy metadata.
Taxonomy terms describe the topic or subject of the page or document, or important topics mentioned within a document. For example, a case study on building a bus terminus in a village may actually belong to a broader concept of ‘Rural Transport’—a term that may other people (the target audience) may have an interest in. Assigning the taxonomy term ‘Rural Transport’ to the document makes it available to the people using such a term to look for information.
Assigning taxonomy terms is primarily a human task as it requires experience and sensemaking skills. However, only someone with the diligence of a journalist or librarian will take the time to assign the right terms. Needless to say most staff will find ways to bypass this task. This is where auto-classification algorithms can help. Here's how it works:
- Auto-classification algorithms are first fed many documents with correctly assigned taxonomy tags.
- The algorithms learn from these exemplar training sets and then are able to predict taxonomy tags for new documents.
The creators now just have to approve or change the suggestions, a lighter task than having to manually identify and select taxonomy terms.
Sometimes, even with the help of algorithms, you may need manual intervention to get the ball rolling. In one search project, we uncovered ambiguous and confusing page titles. We brought this to the attention of the creators, telling them how this was stopping users from finding their content. We then compared their usage metrics with similar well-structured documents written by their peers. We took this opportunity to train and educate them on publishing well-structured documents and showed them how they could use analytics to view the performance of their pages.
Using this 'bright spots' approach, we were able to bring some order to the chaos and improve the overall quality of the content, thereby enhancing the effectiveness of search.
Ignorance and habit are stopping creators from publishing search-friendly content. Some might argue that tools for automatically extracting entities, facts and topics already exist, so why meddle with the habit?
Well, firstly, the tech is not perfect, and much effort is required to get it right. Secondly, it is not sustainable as things will only get harder as the content grows. Lastly, why would you throw away a chance of using effort-reducing smart tech at the authoring stage to upskill your staff? The best recipe is people + technology working to optimise each others’ strengths.
We believe that by using a mix of algorithms, training and a heavy dose of psychology, we can create a sustainable process for creating search-friendly content at the source.
Special thanks to Patrick Lambe for his feedback on the subject.