Crack Any Data Analyst Interview Bhavesh Arora Complete Data Analyst Interview Questions
Cracking the Interview for
DATA ANALYSTS
Featuring 200+ Real Interview Questions from:
American Express, PhysicsWallah, Zepto, EY, Genpact, Walmart
and many more…
Curated by
SAURABH G
Founder at DataNiti
6+ Years of Experience | Senior Data Engineer
Linkedin: [Link]/in/saurabhgghatnekar
BHAVESH ARORA
Senior Data Analyst at Delight Learning Services
[Link] – IIT Jodhpur | 3+ Years of Experience
Linkedin: [Link]/in/bhavesh-arora-11b0a319b
Nervous about your Data Analyst Interview?
Your preparation starts here.
“This book gave me clarity on the kind of questions top companies ask and how to
crack them with confidence.”
— Ritika, Data Analyst at a FinTech Startup
Let’s embark on this journey together and make your dreams
a reality, starting today.
Page 1 of 12
Crack Any Data Analyst Interview Bhavesh Arora Complete Data Analyst Interview Questions
🛠 1. Advanced SQL
• Explain the concept of window functions and provide examples of
their usage.
• How would you optimize a complex SQL query for performance?
• Describe the differences between OLAP and OLTP databases.
• What are common pitfalls when using GROUP BY with aggregate
functions?
• How do you implement recursive queries in SQL?
• Explain the use of Common Table Expressions (CTEs) and their
advantages.
• What strategies can you employ to handle and query large datasets
efficiently?
• Discuss the implications of indexing on query performance.
• How do you detect and resolve deadlocks in a database?
• Explain the concept of database normalization and its trade-offs.
• What are the differences between clustered and non-clustered
indexes?
• How would you approach migrating data between two different
database systems?
• Describe the ACID properties in the context of database
transactions.
• How can you implement and manage database partitioning?
• What are materialized views, and when would you use them?
• Explain the concept of sharding in databases.
• How do you handle dynamic pivot tables in SQL?
• Discuss the use and benefits of stored procedures.
• What are the considerations for designing a schema for a new
application?
• How do you ensure data integrity across multiple tables?
Page 2 of 12
Crack Any Data Analyst Interview Bhavesh Arora Complete Data Analyst Interview Questions
2. Data Visualization Tools (Power BI, Tableau)
• How do you optimize dashboard performance in Tableau/Power BI?
• Explain the process of creating custom calculated fields and their
use cases.
• What are the differences between blending and joining data
sources?
• How do you implement row-level security in your reports?
• Describe the steps to create a dynamic parameter in Tableau.
• How can you use DAX functions to create complex measures in
Power BI?
• What are the best practices for designing intuitive and effective
dashboards?
• How do you handle real-time data streaming into your
visualizations?
• Explain the use of LOD (Level of Detail) expressions in Tableau.
• What strategies do you use to ensure data accuracy in your reports?
• How do you implement drill-through actions in Power BI?
• Discuss the advantages and limitations of using custom visuals.
• How do you manage and version control your BI reports?
• What are the considerations for mobile-friendly dashboard design?
• How do you integrate Python or R scripts into your visualizations?
• Explain the process of setting up alerts based on data thresholds.
• How do you handle multi-language support in your reports?
• What are the challenges of working with large datasets in BI tools,
and how do you address them?
• How do you ensure your dashboards are accessible to users with
disabilities?
• Discuss the importance of storytelling in data visualization.
Page 3 of 12
Crack Any Data Analyst Interview Bhavesh Arora Complete Data Analyst Interview Questions
3. Python for Data Analysis (Pandas, NumPy)
• How do you handle missing data in Pandas DataFrames?
• Explain the difference between .apply(), .map(), and .applymap()
functions.
• What are the advantages of using vectorized operations in NumPy?
• How do you merge multiple DataFrames with different join types?
• Describe the process of pivoting and melting DataFrames.
• How can you optimize memory usage when working with large
datasets?
• Explain the use of multi-indexing in Pandas.
• How do you perform time series analysis with Pandas?
• What are the differences between deep and shallow copies of
DataFrames?
• How do you implement custom aggregation functions with
groupby()?
• Explain the use of broadcasting in NumPy arrays.
• How do you handle categorical data in Pandas?
• What strategies do you use for efficient string manipulation in
DataFrames?
• How can you parallelize operations to speed up data processing?
• Describe the process of reading and writing data to various file
formats.
• How do you ensure reproducibility in your data analysis workflows?
• What are the best practices for debugging and testing data analysis
code?
• How do you integrate Pandas with SQL databases?
• Explain the use of window functions in Pandas.
• How do you visualize data directly from Pandas using built-in
plotting functions?
Page 4 of 12
Crack Any Data Analyst Interview Bhavesh Arora Complete Data Analyst Interview Questions
4. Advanced Statistics and Machine Learning
• Explain the assumptions underlying linear regression models.
• How do you assess the goodness-of-fit for a regression model?
• What is the difference between parametric and non-parametric
tests?
• How do you handle multicollinearity in predictive models?
• Explain the concept of bootstrapping and its applications.
• What are the differences between bagging and boosting algorithms?
• How do you evaluate the performance of a classification model?
• Describe the process of feature selection and its importance.
• What is the curse of dimensionality, and how do you address it?
• How do you handle imbalanced datasets in machine learning?
• Explain the concept of regularization and its types.
• What are the trade-offs between bias and variance in model
performance?
• How do you interpret the coefficients of a logistic regression
model?
• What is the role of cross-validation in model evaluation?
• How do you implement a recommendation system?
• Explain the differences between K-Means and hierarchical
clustering.
• What are the advantages and disadvantages of using deep learning
models?
• How do you ensure that your machine learning model generalizes
well to new data?
• Describe the process of hyperparameter tuning and its significance.
• What are the ethical considerations when deploying machine
learning models?
Page 5 of 12
Crack Any Data Analyst Interview Bhavesh Arora Complete Data Analyst Interview Questions
5. Business Acumen and Case Studies
• How would you assess the impact of a new product launch on
company revenue?
• Describe a time when you used data to influence business decision-
making.
• How do you prioritize competing analytical projects with limited
resources?
• What metrics would you consider to evaluate customer satisfaction?
• How would you identify potential market segments for a new
service?
• Explain how you would conduct a SWOT analysis using data.
• How do you measure the success of a marketing campaign using
data?
• What KPIs would you track to monitor an e-commerce website's
performance?
• If revenue suddenly drops, how would you investigate the root
cause using data?
• How would you help a business reduce customer churn?
• Design an A/B test to evaluate a new landing page’s effectiveness.
• How would you estimate the lifetime value of a customer?
• What data would you analyze to improve delivery time in a logistics
company?
• If a company wants to expand into a new region, what data would
you analyze?
• How would you use data to help HR reduce employee attrition?
• How do you balance short-term vs long-term goals in business
analytics?
• What are the most important financial metrics for a SaaS business?
• How do you handle stakeholder disagreements about what the data
“means”?
• Walk me through how you’d build a data-driven strategy for
product development.
Page 6 of 12
Crack Any Data Analyst Interview Bhavesh Arora Complete Data Analyst Interview Questions
6. Data Engineering + Data Architecture
• Explain the differences between ETL and ELT.
• How would you handle schema evolution in a data pipeline?
• What are the trade-offs between batch and streaming data
processing?
• How do you design a fault-tolerant data pipeline?
• Compare data warehouses (like Redshift, BigQuery) vs data lakes.
• What’s the role of Apache Airflow in data engineering?
• How do you manage and monitor data pipeline failures in
production?
• Explain the use of partitioning and bucketing in big data
frameworks.
• Describe the CAP theorem and its relevance to distributed systems.
• How do you optimize joins in Spark or Hadoop for large datasets?
• Explain how Kafka fits into a real-time data stack.
• What’s the difference between star schema and snowflake schema?
• How do you choose between relational and NoSQL databases?
• What security best practices do you follow for sensitive data?
• Explain how you would implement GDPR-compliance in a data
system.
• What is data lineage and why is it important?
• How do you ensure data quality and consistency in large pipelines?
• How would you architect a scalable analytics platform for a growing
startup?
• What’s the role of a data catalog in a modern data ecosystem?
• Describe a time you designed or optimized a complex data system.
Page 7 of 12
Crack Any Data Analyst Interview Bhavesh Arora Complete Data Analyst Interview Questions
7. Analytical Thinking & Product Metrics
• How would you measure user engagement for a mobile app?
• What metrics would you use to track product adoption?
• Explain north star metrics. Can you give one for Uber? For Spotify?
• Design a metric to evaluate the success of a new feature.
• How would you detect anomalies in key business metrics?
• What is cohort analysis, and how would you apply it in a SaaS
company?
• How do you approach defining KPIs for a new product?
• How would you analyze funnel drop-offs in an e-commerce app?
• What are lagging and leading indicators? Give examples.
• Describe how you’d calculate Net Promoter Score (NPS).
• If DAU is going up but retention is dropping, what might be
happening?
• What would you do if two teams disagreed on which KPI to
prioritize?
• How do you design dashboards for senior leadership vs product
teams?
• How do you distinguish correlation from causation in user behavior
data?
• If users are spending more time on your app, is that good? Always?
• How would you use data to decide whether to sunset a feature?
• What metrics would you use to optimize the onboarding process?
• Describe a trade-off between short-term wins and long-term
metrics.
• How do you evaluate the impact of pricing changes on revenue and
churn?
• How do you forecast product usage or growth?
Page 8 of 12
Crack Any Data Analyst Interview Bhavesh Arora Complete Data Analyst Interview Questions
8. Case Study-Based Data Analyst Questions
• Your product’s conversion rate dropped 20% last week. What’s your
approach to find the root cause?
• Design a dashboard for a food delivery app like Swiggy/Zomato.
What KPIs would you track?
• Your marketing team wants to reduce CAC by 15%. How would you
help with data?
• Suppose app uninstall rates are spiking. Walk me through your
analysis.
• You’ve been asked to evaluate a new pricing strategy. What data
would you need?
• You are working with Product to launch a new feature. How would
you measure success?
• Customer retention has decreased. How would you diagnose this?
• You have incomplete data in a time series. How do you impute and
validate it?
• The client says their churn is increasing, but data shows otherwise.
What do you do?
• Sales dropped after a UI redesign. How would you confirm if it's
causally linked?
• You’re assigned to reduce app crashes using analytics. What’s your
first move?
• Design an A/B test for changing the 'Buy Now' button on an e-
commerce platform.
• Build a cohort retention table for a subscription app. What insights
can you draw?
• Your manager says revenue is healthy but profit isn’t. What would
you check?
• Your competitor launched a new feature. What competitive metrics
would you track?
• Design an alert system for unusual user behavior using SQL and
Python.
• You’re given 2 months of raw order data. Tell us everything useful
you can extract.
• You have a CSV with 1M rows and no documentation. How do you
proceed?
Page 9 of 12
Crack Any Data Analyst Interview Bhavesh Arora Complete Data Analyst Interview Questions
• Build a model to predict which leads are likely to convert. What
features would you use?
• You’re the first data analyst at a startup. What would your
30/60/90-day plan be?
9. Excel / Google Sheets for Data Analysts
• What’s the difference between VLOOKUP, INDEX-MATCH, and
XLOOKUP? When would you use each?
• Write a formula to calculate Year-over-Year growth with missing
months.
• How would you detect and remove duplicates using formulas only?
• How do you calculate a rolling 7-day average with Excel functions?
• Create a dynamic dashboard using slicers, pivot tables, and charts.
• Explain the use of INDIRECT, OFFSET, and MATCH with examples.
• What’s the fastest way to filter the top 10% of performers by a
metric?
• How do you calculate retention rates from cohort-based raw data in
Excel?
• Create a Pareto chart from a list of product issues and frequency.
• Use a formula to return the second highest value in a dataset with
ties.
• Explain conditional formatting with custom formulas (e.g., based on
column values).
• How would you handle time-series forecasting in Excel?
• Write a formula that counts unique values with conditions.
• How do you split full names into First, Middle, and Last using
formulas?
• How can you use ARRAYFORMULA and QUERY in Google Sheets for
real-time dashboards?
• Explain data validation with dependent dropdowns.
• Build a mini KPI dashboard for daily revenue vs target.
• How do you find outliers using formulas?
• What’s the best way to track daily revenue trends over time with
dynamic ranges?
Page 10 of 12
Crack Any Data Analyst Interview Bhavesh Arora Complete Data Analyst Interview Questions
• Can you simulate an A/B test in Excel using RAND()?
10. Probability & Statistics – 20 Questions
• What is the Central Limit Theorem, and why is it important?
• Explain p-value. Can a small p-value guarantee a good model?
• What’s the difference between Type I and Type II errors?
• How do you interpret a 95% confidence interval?
• What does it mean if two variables are statistically independent?
• Explain the difference between correlation and causation.
• What is Bayes’ Theorem, and how can it be applied in spam
detection?
• You flip a coin 10 times and get 8 heads. Is the coin biased? Justify
statistically.
• What is A/B/n testing? When do you stop running the test?
• What’s the probability of drawing two aces in a row from a deck
without replacement?
• What’s the difference between uniform, binomial, and normal
distributions?
• How do you identify if your data follows a normal distribution?
• Explain the law of large numbers with an example.
• Why do we use z-scores, and how are they calculated?
• What is Simpson’s Paradox? Can you give a real-life example?
• What’s the difference between population vs sample statistics?
• When would you use a t-test vs z-test?
• You observe a lift in conversions after an ad campaign. How do you
validate it?
• What’s the meaning of statistical power in hypothesis testing?
• What are confidence intervals vs prediction intervals?
Page 11 of 12
Crack Any Data Analyst Interview Bhavesh Arora Complete Data Analyst Interview Questions
11. Python Coding / Data Structures – 20 Questions
• Write a Python function to calculate moving averages over a sliding
window.
• How would you merge multiple CSVs from a directory using pandas?
• How do you handle missing values using multiple strategies in
pandas?
• What is the time complexity of searching in a list vs a set?
• Implement a basic caching mechanism in Python using a dictionary.
• Write a function to find the most frequent element in a list.
• How do you pivot and unpivot dataframes in pandas?
• When should you use groupby().agg() vs apply()?
• Explain vectorization and how it improves performance in NumPy.
• Use itertools to generate all permutations of a string.
• Parse a large JSON file and extract nested keys efficiently.
• Detect duplicate rows in a large dataset without using pandas.
• Write a Python function to flatten a nested list.
• How would you implement a queue using two stacks?
• What are generators, and how do they help with big data?
• How do you remove outliers using the IQR method in pandas?
• Describe when you'd use list comprehension vs map/filter.
• How to merge two dictionaries with value aggregation in Python?
• Write a SQL-like join in pure Python between two lists of
dictionaries.
• Build a command-line tool to summarize CSV files (mean, median,
missing counts).
Page 12 of 12