We are about to release GroupDocs.Search for Java, a full-featured, back-end API that could easily be utilized by Java developers in their projects. It is a fascinating document search API that extracts text and metadata from documents. Furthermore, it performs advanced searching and indexing operations on the basis of fuzzy and synonym algorithms. API also supports full-text search.
Features Offered by GroupDocs.Search for JavaGroupDocs will keep on updating this API with new features. An initial list of features offered by the API will be:
Searching Features
Search Queries
- Simple Queries
- Boolean Queries
- Regular Expression Queries
- Faceted Search Queries
- Case Sensitive Search Queries
Advance Search
- Fuzzy Search
- Synonym Search
- Date Range Search
- Numeric Range Search
- Password Protected Documents Search
- Search using Morphological Word Forms
- Spelling Corrector
- Keyboard layout Corrector
- Exact Phrase Search
- Specify Number of Searching Thread
- Cancel Search Operation
- Search by Parts
- Get Search Report
- Highlight Results in Text
Other Features
- Get total hit counts for a search query
- Limit the number of search results
- Get matched words in the found documents
- Warn user in case of not supported settings
- Support different search features in a single search query
- Define table discrete function as a step function
- Save encodings automatically
Indexing Features
- Create Index
- Update Index
- Load Index
- Add Documents to Index
Other Features
- Index metadata of documents
- Merge indexes
- Track all changes to file in index folder
- View progress percentage of indexing or updating
- Prevent unnecessary file indexing
- Subscribe to events
- Extract the list of indexed documents
- Extract document text
- Compact Indexing
- Multithreaded Indexing
- Accent-insensitive indexing
- Detect encoding automatically
- Cancel indexing, updating and merging operations
The API will initially support the following document types for text extraction:
- Word Processing Documents
- Spreadsheet Documents
- Presentation Documents
- OpenOffice Presentation Documents
- Email Messages
- PDF Documents
- Text Documents
- Electronic Publication Documents
- FictionBook Documents
- Microsoft Compiled HTML Help
- OneNote Documents
- ZIP Archives