CSV files are perhaps the most popular way of storing and sharing data. Our desktop’s hard drive is filled with them, whether it’s from a website or our colleague. On a daily basis, we generate a huge amount of data from a number of production systems such as a website, a trading app, a mobile game, or an online ad. When data is pulled from these applications, they are usually in a flat-file format. When working with these CSV files, we work for a variety of tools such as a spreadsheet program, a relational database, or a shared folder.
Where there aren’t too many large files, you can open each file with a spreadsheet program manually. However, if you have more than a few dozen files, the process can be laborious and time-consuming. In addition, the total number of rows may exceed the limitation of your spreadsheet program. For example, Excel can store at most 1,048,576 rows in a single worksheet.
Luckily, there are some useful methods that can help you combine these CSV files more efficiently and automatically.
Option 1: Command Prompt
If you are a Windows user, you can use the built-in Command Prompt to combine CSV files. Command Prompt is a text interface for your computer. You can type simple commands to merge files.
First, put all of your CSV files in a folder and copy the full path of your folder. Then, open your Command Prompt and type the following command successively.
- Navigate to the folder where your CSV files are stored. You can change
2. Then, type another command to combine CSV files. This command will merge all CSV files in the fold into a single CSV file called all-csv-files.csv.
After implementing the above commands, you find the output file already in the folder.
Option 2: Programming
If you’re proficient with programming, you can certainly write a script such as Python, to combine CSV files. They have useful functions or packages that allow you to process CSV files. Here are the sample codes using Python as an example.
In addition to merging CSV files, Python also allows you to see what your data looks like and thus perform data cleaning or analysis. However, if your output file is too large, say, over 10 GBs, Python may not be able to process it by utilizing your local computational resources. Since the environment is running on your local machine, your computer should have a RAM of at least 10 GBs to open the file.
Option 3: Cloud data management tools
If you are not familiar with programming and your file may be too big to read, you can try cloud data management tools such as Acho. These platforms allow you to manage your CSV files in a cloud database and support combining data with no codes. They are able to process data very fast and can open any CSV files without size limitations. Additionally, you can store data on the cloud without occupying the space of your computer. Last, they have various features that support transforming and analyzing your dataset after all CSV files are merged together.
Upload multiple CSV files
First, go to the resource page and choose Batch CSV. Then, upload all of your CSV files and the system will combine them into a single table automatically. Be careful that all files should have the same schema, that is, the same number of columns and the same data type for each column.
Add new CSV files to Batch CSV
You can manage your Batch CSV by clicking it to enter into File Management. It shows the list of files that are included in your Batch CSV. Additionally, you can add more files to it and it will merge new files into the current Batch CSV.
Open your Batch CSV in a project
You can import the Batch CSV to a project. This way allows you to see the entire table that includes all records from different small CSV files. Now, you can explore the dataset or build a data pipeline to clean data or generate reports.
>> Relative articles: How to bulk edit CSV files?
Update your table with new data
When you add new files to your Batch CSV file, you can update your table by clicking the Sync button. It will replace your current table with the latest table and run the entire data pipeline automatically. Thus, if you have done some data transformation or data cleaning, you don't need to rebuild them.
Overall, if you are good at programming and your CSV files are not too large, you can use the command prompt or programming languages to merge files. Yet, if you are good you need a system to manage and analyze your CSV files in a rigid structure, cloud data management tools are more suitable for you.
In this article, we only discuss how to combine datasets with the same schema. If some of your files have different schemas, such as different numbers of columns, or you want to combine datasets horizontally, you should consider other methods to do so. (Related article: How to combine dozens of big datasets together in a minute?)
If you need any help combining multiple CSV files, feel free to contact us through the chatbox below.