The days of a single, English-only version of your content are quickly coming to an end. Not only do you need to publish your content in multiple languages, your users need to be able to search said multilingual content quickly and accurately.

Version 7 of the Rosette Linguistics Platform from Basis Technology may be unable to help you with publishing, but it is a leading candidate to meet your multilingual text analytics and search needs.

How does it work?


  1. The process begins with unstructured text: email, web pages, legacy databases.
  2. The language and encoding of the input text is automatically determined. Rosette 7 supports 55 languages.
  3. The input text is converted to Unicode to ensure correct display of the processed text. Rosette 7 converts 168 legacy encodings to Unicode.
  4. Unstructured Arabic, Asian and European text is analyzed morphologically and tagged appropriately.
  5. Names, dates, places and other entities are identified within the input text -- otherwise known as entity extraction.
  6. English or foreign names are matched against a local database.
  7. Names from foreign languages are translated into English.
  8. The process completes with the output of normalized, tagged and structured data ready for publishing or further analysis.

Top 5 Use Cases

1) Search Engines

Rosette 7 integrates natively with Apache Lucene and Solr to enable enterprises to find and retrieve documents and data in multiple languages.

2) Legal e-Discovery

Legal teams can search across multiple languages during identification, processing, review and analysis phases of the electronic discovery reference model (EDRM).

3) Financial Compliance

Financial institutions are more accurate and efficient (e.g. fewer false positives) during anti-money laundering and counter-terrorism financing initiatives.

4) Anti-Terrorism

Watch list accuracy improve as documents are screened in their original language rather than in their translated form.

5) Unstructured Data Mining

Businesses, both large and small, can process unstructured data -- which makes up the majority of data within an organization -- looking for trends, issues, and opportunities.

Do I Need Multilingual Search?

If your enterprise manages content, either internally for employees or externally for customers, in multiple languages; then the short answer is Yes.

Think about how frustrating it is to be unable to locate a document when you are only dealing with a single language. Now multiply that feeling by the number of languages your company supports or plans to support. Getting it now?

Is your company global? If so, how you do handle multilingual search? Let us know in the comments.