data analyst, data engineer, and data scientist

Ever wonder why you’re expected to deliver before accepting your new job as a data analyst? There are actually significant differences between the role of a data analyst, data engineer or data scientist.

1. Data Scientist

“Data scientists, on the other hand, estimate the unknown by asking questions, writing algorithms, and building statistical models.” -

Back in 2012, Data Scientist was praised by Harvard Business Review as “the sexiest job in the 21st century”. Its role since then has not changed much. Supporting business analysts and product managers, a DS is expected to understand sophisticated statistical model and mathematical algorithms, and more importantly help business stakeholders make the most educated decision.

Skills: Statistics/Mathematics, Distributed Computing, Business Domain Knowledge

Tools: R, Python, Analytics Libraries such as Pandas, Numpy, Scikit-learn, NLTK, etc, Tableau,

2. Data Engineer

Data Engineer is a newly emerged profession. It sort of branches out from the software development role but with a stronger emphasis on data cleaning, data wrangling, post-processing. Often confused with the role of a Data Scientist, a Data Engineer is actually a lot less involved in data interpretation but more responsible for the data pipeline and the back-end infrastructure.

Skills: Hadoop, MapReduce, Hive, Pig, Data streaming, NoSQL, SQL, programming.

Tools: DashDB, MySQL, MongoDB, Cassandra, Acho Studio

3. Data/Business Analyst

A Data/Business Analyst is usually very involved in a business’s day to day operation. The analytical layer of most businesses usually revolve around a variety of software tools such as SAP, Oracle and etc. To master these tools and report timely insights is critical to an analyst.

Skills: Data Analysts need to have a baseline understanding of some core skills: statistics, data munging, data visualization, exploratory data analysis

Tools: Microsoft Excel, Acho Studio, SPSS, SPSS Modeler, SAS, SAS Miner, SQL, Microsoft Access, Tableau, SSAS.