Writing is not just a simple task for everyone. It is recommended not to repeat the same words and phrases again and again. In today’s world of optimization, you often need to count and then limit the repetition of words and phrases. This article discusses, how to programmatically count words in documents and the occurrences of each word in Java.

Java API to Count Words & Occurrences

GroupDocs.Parser showcases the document parsing solution for developers. I will use its Java API i.e. GroupDocs.Parser for Java for the extraction of text from documents, and counting occurrences. The API also allows the images, and metadata extraction for a large list of supported document formats like word-processing documents, presentations, spreadsheets, emails, databases, eBooks, and many others.

Download and Configure

Get the library from the downloads section. For your Maven-based Java application, just add the following pom.xml configuration. After this, you can run the examples of this article, and many more examples available on GitHub. For the details, you may visit the API Reference.

<repository>
	<id>GroupDocsJavaAPI</id>
	<name>GroupDocs Java API</name>
	<url>https://repository.groupdocs.com/repo/</url>
</repository>
<dependency>
	<groupId>com.groupdocs</groupId>
	<artifactId>groupdocs-parser</artifactId>
	<version>22.3</version> 
</dependency>

Count Words in Document using Java

Firstly, it is important to accurately parse and extract the whole content of the document before counting the words. After the extraction of the text, we can easily split its content into a collection of words and phrases. The following steps show how to count the words within the document using Java.

  • Load the document using the Parser class.
  • Fetch the text of the loaded document using TextReader.
  • Split the text into words using delimiters.
  • Perform word count.

The following Java source code counts the number of words in a document.

Count Words Occurrences in Java

Likewise, we can count how many times a particular or any unique word or a phrase appeared in the document. By using this feature, you can avoid the repetition of any word within the article. The following steps count the occurrence of each word within the document using Java.

  • Load the document using the Parser class.
  • Retrieve the text of the loaded document using TextReader.
  • Read and split the whole text into words collection.
  • Traverse the words collection to count the appearance of each words.

The following Java code snippet counts the occurrence of each unique word within the document.

The following is the output of the above code:

lorem: 6
ipsum: 2
eleifend: 2
integer: 1
augue: 3
aliquet: 1
ligula: 1
dolor: 1
venenatis: 2
viverra: 1
amet: 2
urna: 1
senectus: 2
lectus: 2
volutpat: 1
massa: 1
blandit: 1
dapibus: 1
habitant: 2
pharetra: 2
...

Get a Free API License

You can get a free temporary license in order to use the API without the evaluation limitations.

Conclusion

To conclude, you learned how to count words in a document using Java. Additionally, we discussed how we can get the word occurrence count for each word used in the document. Try developing your online word counter Java application. For more details and learning about the API, visit the documentation. For queries, contact us via the forum.

See Also