CSV files are vastly used while sharing large data in a compact size. Such data contain comma-separated values which are not really very human readable. However, it is quite easy to manipulate CSV files using editors like Microsoft Excel, OpenOffice Calc, or LibreOffice. There are certain scenarios where we need to compare two large CSV files for their difference. In order to perform this comparison, we do it programmatically. Earlier, we have discussed this CSV files comparison in Java. This article covers the basic way how to compare two CSV files using C# within the .NET application.
.NET API for Comparing CSV Files
GroupDocs provides its document comparison solution for various file formats. We will use its .NET API to compare CSV files within the application. It allows comparing two or more CSV files for differences. It further supports comparing password-protected CSV files, accepting and rejecting the discovered changes and much more.
You can download the DLLs or MSI installer from the downloads section or install the API by adding its package to your .NET application via NuGet.
PM> Install-Package GroupDocs.Comparison
Running examples for its features are also available at GitHub. Visit its documentation and API Reference for guidance.
How to Compare CSV Files using C#
Let’s head towards our objective and perform the basic comparison. You just have to load the comparison file and then call compare feature to get the comparison results. The following two are the files that are compared in the article below:
The following are the steps to compare two CSV files for differences using C#:
- First, load the CSV file using the Comparer class.
- Then, add the second CSV file using the respective Add method.
- Finally, compare both the files using the Compare method.
The following C# code compares the CSV files and provides the differences in a CSV output within the .NET application.
The output result is as follows:
How Changes Are Displayed in the Resulting CSV
Unlike formats such as DOCX or XLSX, CSV files do not support text formatting (colors, bold, etc.), so changes cannot be highlighted with color. To keep the resulting CSV readable, GroupDocs.Comparison marks changes with plain text markers directly inside the cell value:
- Inserted text is wrapped in parentheses — for example,
(new value) - Deleted text is wrapped in square brackets — for example,
[old value] - Unchanged text is left as is
For example, if the source CSV contains the value Infrastructure Assessment in a cell and the target CSV contains Infra Assessment, the resulting CSV will contain Infra[structure] Assessment in that cell. This way, you can easily see what was removed and what remained — directly within the CSV output, without any additional tooling.
Export CSV Comparison Changes to JSON in C#
In addition to producing a result CSV file, GroupDocs.Comparison gives you programmatic access to the list of detected changes via the Changes collection of the result Document. Each ChangeInfo entry contains the change type, source and target text, and — for Cells comparer (CSV, XLSX, ODS, etc.) — the row index, column index, and the column header taken from the first row of the file.
This makes it easy to export the comparison report to JSON, where each change is mapped to a named field of your data:
using GroupDocs.Comparison;
using GroupDocs.Comparison.Options;
using System.IO;
using System.Linq;
using System.Text.Json;
string source = "source.csv";
string target = "target.csv";
string outFilePath = "result.csv";
string outFilePathJson = "result.json";
using (var comparer = new Comparer(source))
{
comparer.Add(target);
var doc = comparer.Compare(outFilePath, new CompareOptions());
var changes = doc.Changes;
var json = changes.Select(c => new
{
id = c.Id,
type = c.Type.ToString(),
componentType = c.ComponentType,
row = c.Row,
column = c.Column,
columnHeader = c.ColumnHeader,
sourceText = c.SourceText,
targetText = c.TargetText,
text = c.Text
});
File.WriteAllText(outFilePathJson,
JsonSerializer.Serialize(json, new JsonSerializerOptions { WriteIndented = true }));
}
Each entry in the produced JSON file will look similar to this:
{
"id": 0,
"type": "Deleted",
"componentType": "Run",
"row": 1,
"column": 0,
"columnHeader": "Service",
"sourceText": "Infrastructure Assessment",
"targetText": "Infra Assessment",
"text": "structure"
}
With the columnHeader property, you immediately know which field of the CSV record was affected, without having to parse the original file separately to map column indexes back to header names.
Get a Free API License
You can get a free temporary license in order to use the API without the evaluation limitations.
Conclusion
To conclude, we have learned how to compare two CSV files within a .NET application using C#. The features let you find the dissimilarities between any two large CSV files. The summary of the comparison also provides the count for differences found within the compared files. Using these, you can build your own online CSV files comparison .NET application.
For more details and to learn about the API, visit its documentation. For queries, contact us via the forum.