Archives like ZIP, RAR, TAR, GZIP, BZIP2 are commonly used to store more than one file and folder in a single container. Another main reason for archive files is to reduce the total file size using compression algorithms. Just like parsing and extracting data from documents of various file formats, you can treat the archive files in the same way. You can extract the text, images, and even metadata from the files that are compressed within the archives. In this article, we will discuss how to extract the ZIP archives data using C# with your .NET applications.

The following topics are covered below:

.NET API to Extract ZIP files Data

GroupDocs.Parser provides the document parsing solution for developers. I will be using its .NET API to extract ZIP files data within the C# examples of this article. The API further allows extraction of text, images, and metadata from a long list of supported document formats like word-processing documents, presentations, spreadsheets, emails, databases, eBooks, and many others.

You can download the DLLs or MSI installer from the downloads section or install the API by adding its package to your .NET application via NuGet.

PM> Install-Package GroupDocs.Parser

How to Extract ZIP Files Data in C#

The GroupDocs.Parser for .NET supports data extraction from various compression file formats like ZIP, RAR, TAR, BZIP2, & GZIP. After retrieving the collection of files from the compressed file, you can further extract any kind of data from each file.

The following steps show how to extract ZIP files data and retrieve text from each enclosed file in C#.

  • Load the ZIP archive using Parser class.
  • Obtain the attachments using GetContainer method
  • Traverse the collection of attachments.
  • For each attachment, you can get its different kind of data using respective methods of the Parser class.

The source code shows how to extract the ZIP files data using C#. In this example, I will be extracting the whole text from all the files within the ZIP archive.

The output of the above source code shows the text retrieved from one of the PDF files within the ZIP file.

 -----------------------------------
 Name: sample.pdf
 File Size: 33370 Bytes
 -----------------------------------

 Heading

 This is the first paragraph of the sample document that contains some sample
 text, bulleted list, numbered list and more.

    •  Bullet Item 1
    •  Bullet Item 2
    •  Bullet Item 3
 
 This is the second paragraph of the sample document and after this, there is a
 numbered list: 

    1. Numbered Item 1
    2. Numbered Item 2
    3. Numbered Item 3 

Get a Free API License

You can get a free temporary license in order to use the API without the evaluation limitations.

Conclusion

To sum up, you learned how to extract ZIP archives data using C# within your .NET application. More specifically you can now extract data from ZIP, RAR, TAR, GZIP, and BZIP files. You can even build your own data extraction .NET application for compressed files. For more details or learning about the API, visit the documentation. For queries, contact us via the forum.

See Also