top_interview_questions_for_data_analyst
top interview questions for data analyst

The demand for data analysts has been steadily growing across industries. Since more businesses are relying on data to optimize their processes and performances, job openings have been increasing in functions such as digital marketing, sales operation, supply chain, human resources and more.

If you’re interested in pursuing a career in data analytics, knowing the job description can be helpful. A data analyst is typically responsible for gathering, cleaning, and studying data to help solve problems or answer a specific question. They can apply their knowledge in almost every industry, from business, finance, medicine, and science, to even government affairs and criminal justice. While that might seem a very promising career, the competition for these job openings is getting more fierce.

So, what can you do to stand out from the crowd?

For starters, a data analytics position would require extensive technical background such as programming, statistical modeling, and a good understanding of relevant tools. Knowing SQL, R, SAS or Python would help you land many jobs. If you have experience working with ETL, business intelligence, and database management tools, it’d be a big bonus.

Like most technical interviews, a good amount of preparation will typically work well. Based on research, we listed some of the most important analytical, business, and skill-related questions that are often asked in data analyst job interviews, along with plausible answers to help you ace that interview!

Analytical Questions

Analytical questions help the interviewer judge your logical reasoning and critical thinking skills, so they know how well you will perform with hard problems sometimes under pressure. What would you do if an unexpected situation arose? How do you 'think'?

Here are some sample questions to get you thinking!

#1 How can you unify data from different sources?

You can tackle data unification problems by identifying similar data records and combining them into one record containing all the valuable attributes but without any redundancy. You can also facilitate schema integration via schema restructuring.

This question is to test how you can tackle the situation when you have data from multiple sources. You can use the tool that you are familiar with to explain how you will unify data. For example, in SQL, you can use Join to combine two tables based on their shared columns. In Tableau, you can use data blending or join to unify data from multiple sources, so you can put all data in a single dashboard.

An example can help your interviewer understand how well you know the idea.

#2 What should you do with missing or erroneous data?

It is a very practical question since analysts often deal with missing or erroneous values in data. The first step should be to understand why these problematic values happen. You can use some techniques, such as charts,  to examine if there is any pattern in missing or erroneous data. For example, missing values may only happen on weekends instead of weekdays.

Once you know why they happen, you can choose a method to deal with them. There are several ways that you can handle them.

  1. Keep all records
  2. Filter out records with missing or erroneous values
  3. Impute missing values with some statistical methods, such as mean

#3 What problems can arise during data analysis?

This is an open question but companies sometimes ask to understand your experiences. Here’re some examples,

  • Incomplete data which could lead to faulty results
  • Poor quality data from unreliable sources which warrants data cleansing
  • Spelling mistakes or duplicate entries that decrease data quality
  • Variation in data from multiple sources which needs to be cleansed and reorganized to generate a single record with all the useful attributes but no redundancy or variation

#4 When should a model be rebuilt or re-assessed?

Even though a business is ever-changing, its model doesn’t change often. But if a business operation experiences a sudden rise or fall when entering a new market, we will have to re-build the analytic model. Simply put, a model needs to be retrained based on the changing consumer behavior which alters business dynamics.

#5 Suppose you have ten stacks of 10 coins each, and each coin weighs 10 grams. However, one of the stacks is defective, so it contains coins that are only 9 grams each. What is the minimum number of weights needed to identify the defective stack

Although it might seem daunting at first, the solution to this question is fairly simple. You will need to pick one coin from the 1st stack, two from the 2nd, three from the 3rd, and so on until you reach the 10th stack. At this point, you have a total number of 55 coins. If none of the coins were defective, the weight would be 550 grams (5*10=550). But if the first stack is defective, the total weight would be one less than 550, so 549 grams. But if the second stack is defective, the total weight of the coins would be two less than 550, which is 548 grams. Similarly, if the 3rd stack were defective, the weight of the coins would amount to 547 grams, which is three less than the total 550. Thus, we would easily be able to identify the defective stack using this technique.

Technical Questions

When someone chooses to hire you, they need to be sure that you know what you’re talking about. Thus, they will ask questions about your specific industry to ensure that you know your industry well enough to outperform other candidates.

Here are some business domain questions to help you prepare:

#1 What is data cleansing, and what are the best ways to practice it?

This is one of the most frequently asked questions in job interviews for data analysts, but the answer is quite simple. Data cleansing refers to identifying and correcting inconsistencies and errors to improve data quality, and the best ways to practice it are:

You can mention some steps such as:

  • Remove irrelevant data, such as unnecessary columns
  • Examine and correct the data type for each column
  • Deal with syntax errors, such as typos or white spaces at the beginning or the end
  • Remove duplicated rows or columns
  • Handle missing or erroneous data
  • Standardize or normalize the scale of the columns

#2 What is Time Series Analysis?

Time series is a sequence of data points that are collected over a period of time, such as a company’s stock prices, or a country’s population over time. Time series analysis is to analyze these data points to find some patterns such as seasonality or trends, and then further determine the future trend via forecasting models.

#3 Explain what logistic regression is with an example.

Logistic regression is a statistical method for examining a dataset in which there are one or more independent variables that define an outcome.

Example 1:  Imagine that we are interested in the factors that influence a political candidate’s chances of winning an election.  The outcome (response) variable is binary (0/1);  win or lose. The predictor variables of interest are the amount of money spent on the campaign, the amount of time spent campaigning negatively and whether or not the candidate is an incumbent.

Example 2: When a credit card transaction happens, the bank makes a note of several factors. For instance, the date of the transaction, amount, place, type of purchase, etc. Based on these factors, they develop a Logistic Regression model of whether or not the transaction is a fraud.

For instance, if the amount is too high and the bank knows that the concerned person never makes purchases that high, they may label it as a fraud.

#4 Explain what data normalization is.

Database normalization is the process of structuring a database, whether it is relational or non-relational. Based on similar fields and records, it helps increase data consistency across tables and improve data integrity.

#5 What do you understand about cascading referential integrity?

The Cascading Referential Integrity Constraints in SQL Server are the foreign key constraints that tell SQL Server to perform certain actions whenever a user attempts to delete or update a primary key to an existing foreign key point.

In order to tell the SQL Server what actions to perform whenever a user is trying to delete or update a primary key value to which existing foreign key points, we can use SQL functions such as SET NULL, CASCADE, SET, DEFAULT, NO ACTION.

Skillset & Tools

Questions related to tools and software will help the interviewers assess your skills and level of expertise, which will help them determine if you are cut out for the job. They need to know that you are not just a newbie but somebody who actually possesses the skills they claim to have.

Here are some examples of the types of questions you may be asked:

#1 What are the tools you have used for data reporting?

This is a common question because it reflects your experience in the field. You can answer by simply naming the best tools like Tableau, Looker, Sisense etc.

Generally, in the job description, the company lists some reporting tools that it expects the candidates are able to use. You can pick up some of them from the list and talk about how you used them to make reports before. Remember not to mention any tools that you never used because the interviewers may ask further questions about these tools.

Here are some common reporting tools for analysts:

  • Spreadsheets: Excel, Google Sheets, Airtable, etc.
  • BI tools: Tableau, Power BI, Looker, etc.
  • Script Languages: Python, R, etc.
  • Statistical software: Statas, SAS, etc.

#2 Name the statistical methods that are often used in data analysis?

The most beneficial statistical methods are the Markov process, imputation, mathematical optimization, simple algorithm, Bayesian method, spatial and cluster techniques, outliers detection, percentile, and rank statistics.

The most commonly used techniques in data analysis are descriptive statistics and inferential statistics. Descriptive statistics is used for exploring a dataset and understanding the distribution of the data, whereas inferential statistics utilize some methods to allow you to make inferences from the data.

Here are some examples of statistical concepts in two categories:

  • Descriptive statistics, such as mean, median, standard deviation, range, percentiles, IQR.
  • Inferential statistics, such as hypothesis testing, linear or logistic regression

If you have experience in building statistical models, you can mention some of the techniques you have used:

  • Predictive models, such as KNN, Random Forest
  • Forecasting models, such as ARIMA, exponential smoothing

#3 How can you highlight cells with negative values in an Excel sheet?

You can use conditional formatting to highlight the cells containing negative values. For this, you will have to select the cells with negative values, choose ‘conditional formatting’ from the Home tab, go to ‘Highlight cells rules’ and select the ‘less than’ option. Finally, you will enter 0 in the dialog box that appears- and voila!

#4 How can you validate data?

Common methods include field validation, form-level validation, data saving validation, and search criteria validation. In field-level validation, real-time validation is done, and errors are corrected as you enter the data, while in form validation, errors are highlighted after you submit the form. Also, we use data savings validation when multiple entry forms have to be validated. As for search criterion validation, it ensures that a user’s searched keywords or phrases will return the most relevant results.

#5 Name the qualities of a good data model.

We can consider data good and developed if it shows predictable performance or accurate estimation. It should also be responsive and adaptive to changes to accommodate the changing needs of a growing business. Plus, we must be able to scale in proportion to changes in the data. And of course, it should be consumable to give our clients profitable results.

What’s the Takeaway?

Interviews can be stressful as it is, but when you know you’re up against tough competition, you might feel just a tad more pressure. But don’t panic because we’ve got you covered with our top data analyst questions. Once you’re through with these, you can rest assured that you are well-prepared for your interview. Now, all you need to do is stay calm and answer all questions appropriately and confidently because you’ve got this!