For smaller projects, CSV is a great format for storing data. But what if you have a lot of them? Storing them all on your local computer can be a hassle. First, they can take up a lot of space. Second, they will need to be stored orderly with a naming convention. More importantly, many of these data files should be combined into one giant table. What is the best practice for this?

In essence, you should build a database for these files. A relational database offers a more robust way to organize and manage your data. In this post, I will show you how you can transform your CSV files into a SQL database in three simple steps. I’ll also discuss some of the advantages of using a relational database over CSV files.

A relational database is a database that divides data into linked tables that have shared data points. These shared data points allow us to combine information and create novel tables or views with one simple query operation. Popular types of relational databases include MySQL and PostgreSQL.

Building a robust relational database takes time and effort. But you can do it within minutes on a cloud database management system such as Acho Studio.

Upload multiple .CSV files as a resource

  • Upload multiple .csv files as a MCSV resource on Acho
  • Make sure the column names are the same

First, import your files to Acho Studio as a resource. Here I have a folder called “acho_export_historical_stock_price”. It contains some historical stock price records for 5 years. There are 10 files in this folder. All of them have the same column name.

csv_to_database_on_acho
Build a database with csv files on acho studio

Then go on to Acho Studio’s resource page and select “Multiple CSV” for importing your resource. Be aware that only .csv, .txt, and .txt files are supported at the moment.

import_multiple_csv_files_as_a_resource
import multiple csv files as a resource

If they all have the same column name and correct values for each column, you should see a preview page with a sample of your files like this.

preview_for_multiple_csv_file
multiple csv file preview

Import your resource to a project

  • Create a project based on the resource
  • Check the total row number to make sure there’s no missing data

Once you have successfully uploaded the .csv files as a resource, you can now create a project to access them.

As you can see, all records from the 10 .csv files are now stored in one table.

acho_project_csv_file
acho project for csv files

Update your table with new .CSV files

  • Upload a new table as a resource
  • Import it to the same project
  • Use “union” action to combine two or more tables
  • Use “join” action to combine two more tables based on a key column

One simple way to update your table is by using the “union” action. By uploading the new .csv files to your resource page, you can import them to the same project. The “union” action will help you aggregate all the records based on the column names. There is no limit on how many rows you can add to each table. All the records are stored in a relational database on the cloud, so it’s pretty easy to transform, cleanse and update them.

acho_project_csv_file
acho project for csv files

Another way to update this table is by using the “join” action. It will help you combine two or more tables horizontally based on a key column. Check out how the “join” action works here.

Why do you need to build a database with .csv files?

First, a relational database helps you to see the relationship between different parts of your data more easily. Data is divided into tables and can be combined (“joined” or “unioned”) using SQL-based actions that include simple math operations.

Second, a relational database offers superior data management with greater consistency and flexibility. A CSV or other types of flat file can become unmanageable, and inaccessible. A relational database allows you to enforce data types using schemas thus reducing many potential errors. Relational databases also offer greater flexibility since you can generate more complex data from relatively simple tables using complex queries.

Third, a relational database solves the issue of data redundancy: a good relational database is designed around minimizing data duplication and inconsistent dependencies, through a process called “normalization”. Normalization is the process of organizing the data in your database, and involves both creating tables and defining the relations between them.

Fourth,  a relational database have become the industry standard and database management systems (DBMS) typically offer a more advanced set of features that you’d want for any database such as backups, failover, and access management.

Hope this article helps. If you need a build a database for your CSV files, sign up Acho and give it a try. Should you have any question, chat with us or email us at contact@acho.io