Techemistry Blog

Combine Excel Documents Using C#

3/15/2013 12:47:00 PM -- Ted Krapf

A client yesterday had a common problem, "Ted I have 100+ Excel documents that I need to be able to quickly search through, and I'd like to combine all those work books into a single .xlsx file so that I can just search in one document."

Sounds simple enough right?  The first thing we tried together was scouring the web for a VBA script to do this.  Seems simple, get a list of the files in a directory, open each doc, and copy and paste its contents into the main document.  Well after an hour of stumbling through VBA syntax, help files, blogs, etc, we just couldn't find one that would work.  The best we found that worked would combine the all the worksheets of all the other documents into the main workbook as worksheets -- still leaving us having to manually copy and paste 100's of worksheets into one.

We had also found an array of different commercial 'combine' tools that ranged from dollars to $1,000's.  Needless to say, being a programmer, I wanted to see how I could do this myself and so I could share the solution with you!

I had forgot that months ago I discovered the goodness that is LinqToExcel at CodePlex.  I had found and used LinqToExcel for some basic Excel import functions I wanted to write.  

I had also tried numerous solutions for exporting to Excel from C#, and ended building up a simple OpenXML project/dll to handle my exports.  Most of it was referenced from this great OpenXML article on the MSDN Blogs. (Thanks again Chris!)

So back to the combine solution -- I created a simple Windows form app project in Visual studio that just had a textbox for copy/pasting in the folder of xlsx files that client wanted to combine, and a button to start the function.

The trick to this combine method working is ensuring that the excel documents you are going to combine are:

1.) In the same directory
2.) Have the exact same columns
3.) Have the exact same column headings (caps, spaces, etc count)
4.) The worksheet in each workbook is named the same

First I create a class in my code that matches the columns of the excel document.  

protected class ReconcillationReport
{
            public string ID { get; set; }
            public string Date_Submitted { get; set; }
            public string Commission { get; set; }
            public string First_Name { get; set; }
            public string Last_Name { get; set; }
            public string Home_Phone_Num { get; set; }
            public string Distributor_Name { get; set; }
            public string Account_Num { get; set; }
            public string Status { get; set; }
            public string Code { get; set; }
            public string START_DATE { get; set; }
            public string END_DATE { get; set; }
}

Next I created the button click event, retrieve the source directory from the textbox (txtRecDir.Text), and copy the file names in that directory to a List<>.  I then create a new ReconcillationReport List<>.  Next I loop through all the files in the directory listing (skipping any files called "WOOT" in case a combine was already performed on these files.

Then using LinqToExcel's ExcelQueryFactory, I import each workbook to be combine one at a time and serialize them into a ReconcillationReport object, which I then add to the List of objects.

I then simply convert the List of reports into a DataSet table, and use my export library to export the final combined Excel document.

Here is the button click event

private void button1_Click(object sender, EventArgs e)
        {
            string path = txtRecDir.Text;
            List files = Directory.GetFiles(path, "*.xlsx").ToList();

            List rr = new List();
            foreach (string f in files.Where(c => !c.Contains("WOOT")))
            {
                var excelDoc = new ExcelQueryFactory(f);
                List r = (from c in excelDoc.Worksheet("Reconciliation") select c).ToList();
                rr.AddRange(r);
            }

            DataSet ds = new DataSet();
            ds.Tables.Add(Utils.ToDataTable(rr, "Combined"));
            ExcelExport.CreateExcelFileOriginal.CreateExcelDocument(ds, path + "\\" +  "WOOT.xlsx");

            btnRecCombind.Text = "DONE";
        }

Obviously there is no error checking, and note that this methodology requires advanced knowledge of the structure of the excel documents, but I still thought I'd share it in case it is useful to you.  I'm sure as the client's request evolves I'll probably add the flexibility of combining dissimilar excel files to each other, but initially this simple code snippet saved the day.  My client has cut 2 hours of excel wrangling out of every business day now -- what's that worth? 

Till next time,

TK




[return to articles list]