Classify text using IAB-2 or Document taxonomies in C#

A taxonomy or classification is basically an approach in which text is systematically identified and then organized. When you are dealing with a bunch of data (text based or documents), it becomes hard to find a topic of your need until and unless this data is classified or organized. Hence, you have to classify text in order to fetch data/information quickly.

GroupDocs.Classification for .NET

GroupDocs offers a programmable document or text classification API for .NET developers. You just have to add a single DLL (GroupDocs.Classification for .NET) as a reference in your .NET project. API allows developers to use two different taxonomies: IAB-2 (Interactive Advertising Bureau) and documents taxonomy.

IAB-2 text classification

IAB-2 categories texts into multiple topics and then identifies text based on the depth level. Call Classify method with a text as parameter to perform classification.

This text will be classified as Healthy_Living (IAB-2). Some more examples:

  • Sooner or later technology will overcome labor work – Technology_&_Computing (IAB-2)
  • This game has better graphics on Xiaomi Note 8 pro mobile – Video_Gaming (IAB-2)
  • We need groceries for the next month – Shopping (IAB-2)

Document taxonomy

Documents taxonomy is used to identify different document classes, such as Invoices, CVs, Forms, emails. Call Classify method for “document.pdf” file in the current directory with IAB-2 taxonomy and return 2 best results.

Call Classify method for “document.doc” file with Documents taxonomy, set precision/recall balance to “Precision” and return 4 best results.

API also facilitates classification of password-protected documents.

Below are some helpful resource for you

We’d recommend you to explore these resources, evaluate API and if there is any issue, you can raise it on forum.