Classify text using IAB-2 or Document taxonomies in C#

A classification is basically an approach in which text is systematically identified and then organized. Taxonomy defines the practice and science of such classification, including the principles that underlie such classification. When you are dealing with a bunch of text data (or documents), it becomes hard to find a topic of your need until and unless this data is classified or organized. Hence, you have to classify text in order to fetch data/information quickly.

GroupDocs.Classification for .NET

GroupDocs offers a programmable document or text classification API for .NET developers. You just have to install NuGet-package (GroupDocs.Classification for .NET) or add a single DLL as a reference in your .NET project. API allows developers to use four different taxonomies: IAB-2 (Interactive Advertising Bureau), Documents, Sentiment, or Setiment3 taxonomy.

IAB-2 text classification

IAB-2 categories texts into multiple topics and then identifies text based on the depth level. Call Classify method with a text as a parameter to perform classification.

This text will be classified as Healthy_Living (IAB-2). Some more examples:

  • Sooner or later technology will overcome labor work – Technology_&_Computing (IAB-2)
  • This game has better graphics on Xiaomi Note 8 pro mobile – Video_Gaming (IAB-2)
  • We need groceries for the next month – Shopping (IAB-2)

Documents taxonomy

Documents taxonomy is used to identify different document classes, such as Invoices, CVs, Forms, emails. Call Classify method for “document.pdf” file in the current directory with IAB-2 taxonomy and return 2 best results.

Call Classify method for “document.doc” file with Documents taxonomy, set precision/recall balance to “Precision” and return 4 best results.

API also facilitates classification of password-protected documents.

Below are some helpful resource for you

We’d recommend you to explore these resources, evaluate API and if there is any issue, you can raise it on forum.