What are CSV files?
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Often referred to as a “flat file”, it may contain one or more rows of data records. Each record in the meantime may consist of one or more fields.
By design, it stores data records in a tabular format. Often there are other variations such as Tab-separated values (TSV), Pipe-separated values (PSV), and Delimiter-separated values (DSV).
The Structure of a CSV File
A CSV file has a fairly simple structure. For example, let’s say you have a few contacts in a contact manager, and you export them as a CSV file. You’d get a file containing text like this:
Aaron Smith,firstname.lastname@example.org,123-456-7890,123 Apple Street
Jason Jones,email@example.com,998-765-4321,321 Apple Avenue
They can be more complicated than this and contain thousands of lines, more entries on each line, or long strings of text. Some CSV files may not even have the headers at the top, and some may use quotation marks to surround each bit of data, but that’s the basic format.
That simplicity is a feature. CSV files are designed to be a way to easily export data and import it into other programs. The resulting data is human-readable and can be easily viewed with a text editor like Notepad or a spreadsheet program like Microsoft Excel.
Why is CSV so popular?
CSV, despite its popularity, is not the most optimal way to store data. It has a lot of wasted bytes for delimiters - each field separator is 2 or 3 bytes. When you have data records of hundreds of thousands of lines, they really add up.
However, it can store as much as data your file system can generate. For example, you can export a giant CSV file of more than 1 Terabyte from a Hadoop database. As long as your servers permit, there really isn’t a limit.
Since a CSV can contain millions of lines of records, it becomes a convenient way of sharing a huge dataset. Reading these huge files, however, is another story.
As a human-readable format, a CSV is a very simple representation of the table, with a header, data separated by a delimiter (usually a comma), and rows separated by row terminator. This means that many applications can read a CSV file. Whether it be an Email Relay Service, a CRM, or ERP application, it’s typically relatively easy to upload a clean, structured and small CSV file into a system.
However, a CSV file can’t be very messy as well. Check out this article on why some CSV files are harder to access. When a CSV file contains a messy format, missing column headers, corrupted values, it may require data cleaning to onboard it into a system.
A flat CSV file is perhaps the easiest way to share tabular data. When a CSV file is shared, the recipient can use a variety of applications to open it. Since the format is simple and human-readable, there’s minimum information asymmetry when sharing a CSV file. The alternatives such as a SQL database and XML/JSON may require specific technical skills to access the data.
What should you open it with?
As previously discussed, the CSV format has a long legacy of being the default way of storing tabular data. It is supported by many applications.
One of the simplest applications to open a CSV file is Notepad. Initially released in 1983, it serves as a plan-text reader and editor for Windows. Other Operating Systems have similar programs as well such as TextEdit, and TextWrangler. These applications are great for opening a CSV file if there is no need for processing. Since they only display plain text, there’s also no formatting options available.
An Integrated Development Environment (IDE) such as a Visual Studio, or PyCharm works similarly to a Notepad when it comes to opening a CSV file. It’s a quick way to open a CSV file. But again, it won’t help you make significant changes to the data.
Excel is perhaps the most popular program for opening a CSV file. It offers a suite of data processing features that are rather intuitive to learn. Since the program has been around since the introduction of personal computers, it has a huge community of users that offer great technical advice and support. However, some of the more advanced functions such as Pivot Table, VBA & Macros still take a bit of time to master.
Being a spreadsheet program, Excel is great for manual data entry and data modeling. It unfortunately has a lot of limitations as well. For example, Excel doesn’t work with large datasets very well. A lot of the data entry, modeling, and manipulation tasks are very manual. It’s hard to connect it with an application. It’s hard to collaborate with your teammate on a spreadsheet. This leaves room for a lot of online spreadsheets to grow.
Google Sheets is undoubtedly the most popular online spreadsheet program. It works very similarly to Excel. Many of its features are even identical to Excel so it won’t take too much time to familiarize if you already know how to operate spreadsheets.
What Google Sheets adds to your spreadsheet workflow are two main advantages. One is that Google Sheets is much more connected to your apps being an online spreadsheet application. Another advantage is it helps you and your teammates work on a spreadsheet so much better.
Sharing Google Sheets is also a lot easier since it has an online URL and it can grant access levels to your teammates.
Relational Databases such as MySQL, PostgreSQL and MariaDb are in a lot of ways completely different from spreadsheets. First, it stores your data with schema and table relationships so it can support large amounts of data that may also update. Second, it is robust and maintainable, which means they can support applications of large scale.
Most larger CSV files are stored or read into a SQL database. Since a CSV file is relatively structured in its format, a relational database can not only store but also process it in large volume. For some of the complex calculations, a database can also help support a “pipeline” so the calculations can be split into steps.
A relational database however is difficult to master. There are millions of people dedicated their daily work to maintaining, processing, analyzing and securing relational databases. Many of these jobs require advanced degrees and extensive technical knowledge. This makes it difficult for non-technical users to open a CSV file with it.
Acho is essentially an online database that your team can access. It’s designed to onboard large amounts of data from CSV files. Different from a relational database, it does not require in-depth technical knowledge in database administration. Rather, it offers a simple GUI for onboarding, storing, processing and distributing flat file data.
Since it is built on a cloud database, it can process large volumes of data in size and frequency. There’s practically no limit on how much data you can store and process.
Regarding usability, and accessibility, it helps to know SQL when it comes to using Acho, but it is not required.
When to use Excel and when not?
Excel is by far the most popular way to open a CSV file. However, it has many limitations as you can see from the charts above.
A CSV file is too large
When a CSV file exceeds 1 million rows, it may become inaccessible. A larger table with more than a few hundred thousand rows will slow down your computer significantly as well. Even when your computer is exceedingly powerful, larger CSV files may take up a lot of your hard drive space.
There are too many CSV files
When you receive CSV files regularly from other colleagues or online systems, these files may become unmanageable very quickly. If there is a need to combine them, Excel probably is not the best solution.
The CSV file may need to be updated
Like previously mentioned, an Excel spreadsheet isn’t designed to connect different systems. When you have files that need to be regularly updated, manual data entry may be inevitable. An Excel file may also change very easily in its format, and schema., and this make it difficult to update as well.
Open a CSV on an Online Database System
An Online Database System contrary to a Desktop Application utilizes one or many dedicated servers to process your CSV data. This typically helps you process large amounts of data much faster than relying on your personal computer. When you have a lot of similar CSV files that need to be combined or joined, an online database system such as Acho is much better suited.
In addition to processing power, you may also share the file much more efficiently. Instead of sending your CSV file via an online service and having your recipient download a copy, you can invite the recipient to access the file directly on the web.