Directory Data Wrangling

Data Wrangling

What is Data Wrangling?

Data Wrangling is the process of cleaning and unifying messy and complex data sets for easy access and analysis. Data Wrangling is the process of transforming data from its original “raw” form into a more digestible format and organizing sets from various sources into a singular coherent whole for further processing.

How does Data Wrangling work?

Data Wrangling is one of those technical terms that are more or less self-descriptive. This operation includes a sequence of the following processes:

Preprocessing — the initial state that occurs right after the acquiring of data.
Standardizing data into an understandable format. For example, you have a user profile events record, and you need to sort it by types of events and time stamps;
Cleaning data from noise, missing, or erroneous elements.
Consolidating data from various sources or data sets into a coherent whole. For example, you have an affiliate advertising network, and you need to gather performance statistics for the current stage of the marketing campaign;
Matching data with the existing data sets. For example, you already have user data for a certain period and unite these sets into a more expansive one;
Filtering data through determined settings for the processing.

Example

If you want to visualize the number of customers by city, then you need to ensure that there is only one row per city before data visualization. If you have two rows like Munchen and Munich representing the same city this could lead to wrong results. One of the rows has to be changed manually by the data analyst and this is done by creating a mapping on the fly in the visualization tool and applied to every row of data to detect more such issues and the process is repeated for other cities.