About Stemmed Search and How to Enable It

New in 1.22.0

About Stemmed Search and How to Enable It

In conjunction with search engines, “stemming” is a means to deliver better search results for the terms entered by a user – results that are more relevant, more to the point, so to speak. How does it work?

Basically, stemming reduces the various forms of nouns, verbs, adjectives and adverbs to their root form, the stem. This causes a search term (which has also been reduced) to match even if the content being searched contains only one or more of the term’s different forms and not the term itself. From the perspective of a search engine that applies stemming, “value“, “valued” and “valuable” are the same, just like “tear”, “tearing” and “torn” are. As you can see from these two examples, there is more to stemming than merely removing prefixes and suffixes from words to determine their common stem. If you would like to learn more about the technology behind stemming, take a look at what a search engine like Elasticsearch has to offer.

Of course, stemming is language specific. So, for a search engine to be able to apply stemming as designed, it needs to take account of the language the searched content is in.

Specifying the language editorially

Scrivito lets editors specify the language of any CMS object, be it a page, an image, or a PDF file, simply by setting the “Language” property of the object accordingly. However, in their work procedures, editors don’t need to keep track of this setting because new subpages inherit the language from their parent page. And, when moving one or several pages to a section with a different language, the editing interface notifies you if there is a mismatch and optionally adjusts the language setting of the pages for you.

Setting the language programmatically

Every CMS object has an attribute for maintaining its language, _language. To access it in the browser console, change the context to scrivito_application, and then execute the following:

If your content includes a large number of pages that don’t have a language set, having to use the editing interface to assign a language to them individually can be cumbersome, so let’s make life a bit easier for your editors. All we need is an attribute that uniquely indicates the target language of the pages concerned, e.g. their _path or _siteId. Based on the attribute value, we can then update _language accordingly using a search. Here, we use the “/lang/de” path prefix to identify German-language pages and set their _language to “de”, again in the Browser console, and after creating a working copy:

That was it! After running this small script and publishing the working copy, stemming is instantly applied to your visitors’ searches in the website’s German-language section.