Repetition of data can diminish the worth of the content. Working as a writer, you must follow DRY (don’t repeat yourself) principle. The statistics such as word count or the number of occurrences of each word can let you analyze the content but it’s hard to do it manually for multiple documents. So this article demonstrates how to programmatically count words and the word occurrence count of each word in PDF, Word, Excel, PowerPoint, eBook, Markup, and Email document formats using C#.

.NET API to Count Words & Occurrences

GroupDocs.Parser provides the document parsing solution for developers. For the extraction of text from documents, and counting occurrences, we will use its GroupDocs.Parser for .NET. The API further allows the extraction of images, and metadata from a long list of supported document formats like word-processing documents, presentations, spreadsheets, emails, databases, eBooks, and many others.

You can download the DLLs or MSI installer from the downloads section or install the API by adding its package to your .NET application via NuGet.

PM> Install-Package GroupDocs.Parser

Count Words using C#

For the counting of words, the main thing is to parse and extract the whole content of the document. After the extraction of the text, we can split its content into a collection of sentences and words. The following step allows counting the words within the document using C#.

  • Load the document using the Parser class.
  • Fetch the text of the loaded document into TextReader.
  • Get the text of the document from the TextReader as a string.
  • Split the text into words and save them into a string array.
  • Perform word count.

The following C# source code counts the number of words in a document.

Count Words Occurrence in C#

Similarly, we can count how many times a particular word or a phrase has been used in the document. By using this feature, you can avoid the excessive repetition of any word within an article. The following steps count the occurrence of each word used in a document.

  • Load the document using the Parser class.
  • Retrieve the text of the loaded document into TextReader.
  • Read and split the whole text into the word collection.
  • Traverse the word collection to count words.

The following C# code snippet counts the occurrence of each unique word within the document.

The following is the output of the above code:

Word Occurrence Count

Get a Free API License

You can get a free temporary license in order to use the API without the evaluation limitations.

Conclusion

To sum up, you learned how to count words in a document using C#. Additionally, we discussed how we can get the word occurrence count for each word in the document. Try developing your online word counter .NET application. For more details and learning about the API, visit the documentation. For queries, contact us via the forum.

See Also