0% found this document useful (0 votes)
460 views55 pages

Data Analysis vs. Data Mining Explained

The document outlines key concepts and methodologies related to Data Analysis and Data Mining, highlighting their differences, processes, and tools. It covers various topics such as data validation, cleaning, profiling, and analysis techniques, along with challenges faced by data analysts and the skills required for the role. Additionally, it discusses statistical methodologies, clustering algorithms, and the importance of version control in data projects.

Uploaded by

abaliji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
460 views55 pages

Data Analysis vs. Data Mining Explained

The document outlines key concepts and methodologies related to Data Analysis and Data Mining, highlighting their differences, processes, and tools. It covers various topics such as data validation, cleaning, profiling, and analysis techniques, along with challenges faced by data analysts and the skills required for the role. Additionally, it discusses statistical methodologies, clustering algorithms, and the importance of version control in data projects.

Uploaded by

abaliji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1: What are the key differences between Data Analysis and Data Mining?

Data analysis involves the process of cleaning, organizing, and using data to produce meaningful insights. Data mining is
used to search for hidden patterns in the data.

Data analysis produces results that are far more comprehensible by a variety of audiences than the results from data
mining.

2: What is Data Validation?

Data validation, as the name suggests, is the process that involves determining the accuracy of data and the quality of the
source as well. There are many processes in data validation but the main ones are data screening and data verification.

• Data screening: Making use of a variety of models to ensure that the data is accurate and no redundancies are
present.

• Data verification: If there is a redundancy, it is evaluated based on multiple steps and then a call is taken to ensure
the presence of the data item.

3: What is Data Analysis, in brief?

Data analysis is the structured procedure that involves working with data by performing activities such as ingestion,
cleaning, transforming, and assessing it to provide insights, which can be used to drive revenue.

Data is collected, to begin with, from varied sources. Since the data is a raw entity, it has to be cleaned and processed to fill
out missing values and to remove any entity that is out of the scope of usage.

After pre-processing the data, it can be analyzed with the help of models, which use the data to perform some analysis on it.

The last step involves reporting and ensuring that the data output is converted to a format that can also cater to a non-
technical audience, alongside the analysts.

This Data Analytics Training in Bangalore will help you achieve your dream of becoming a professional data analyst.

4: How to know if a data model is performing well or not?

This question is subjective, but certain simple assessment points can be used to assess the accuracy of a data model. They
are as follows:

• A well-designed model should offer good predictability. This correlates to the ability to be easily able to predict
future insights when needed.

• A rounded model adapts easily to any change made to the data or the pipeline if need be.

• The model should have the ability to cope in case there is an immediate requirement to large-scale the data.

• The model’s working should be easy and it should be easily understood among clients to help them derive the
required results.

5: Explain Data Cleaning in brief.

Data Cleaning is also called Data Wrangling. As the name suggests, it is a structured way of finding erroneous content in data
and safely removing them to ensure that the data is of the utmost quality. Here are some of the ways in data cleaning:

• Removing a data block entirely

• Finding ways to fill black data in, without causing redundancies

• Replacing data with its mean or median values

• Making use of placeholders for empty spaces


6: What are some of the problems that a working Data Analyst might encounter?

There can be many issues that a Data Analyst might face when working with data. Here are some of them:

• The accuracy of the model in development will be low if there are multiple entries of the same entity and errors
concerning spellings and incorrect data.

• If the source the data being ingested from is not a verified source, then the data might require a lot of cleaning and
preprocess before beginning the analysis.

• The same goes for when extracting data from multiple sources and merging them for use.

• The analysis will take a backstep if the data obtained is incomplete or inaccurate.

7: What is Data Profiling?

Data profiling is a methodology that involves analyzing all entities present in data to a greater depth. The goal here is to
provide highly accurate information based on the data and its attributes such as the datatype, frequency of occurrence, and
more.

8: What are the scenarios that could cause a model to be retrained?

Data is never a stagnant entity. If there is an expansion of business, this could cause openings of sudden opportunities that
might call for a change in the data. Furthermore, assessing the model to check its standing can help the Analyst analyze
whether the model is to be re-trained or not.

However, the general rule of thumb is to ensure that the models are re-trained when there is a change in the business
protocols and offerings.

9: What are the prerequisites to become a Data Analyst?

There are many skills that a budding Data Analyst needs. Here are some of them:

• Being well-versed in programming languages such as XML, JavaScript, and ETL frameworks

• Proficient in databases such as SQL, MongoDB, and more

• Ability to effectively collect and analyze data

• Knowledge of database designing and data mining

• Having the ability/experience of working with large datasets

10: What are the top tools used to perform Data Analysis?

There is a wide spectrum of tools that can be used in the field of data analysis. Here are some of the popular ones:

• Google Search Operators

• RapidMiner

• Tableau

• KNIME

• OpenRefine

11: What is an outlier?

An outlier is a value in a dataset that is considered to be away from the mean of the characteristic feature of the dataset.
There are two types of outliers: univariate and multivariate.
12: How can we deal with problems that arise when the data flows in from a variety of sources?

There are many ways to go about dealing with multi-source problems. However, these are done primarily to solve the
problems of:

• Identifying the presence of similar/same records and merging them into a single record

• Re-structuring the schema to ensure there is good schema integration

13: What are some of the popular tools used in Big Data?

There are multiple tools that are used to handle Big Data. Some of the most popular ones are as follows:

• Hadoop

• Spark

• Scala

• Hive

• Flume

• Mahout

14: What is the use of a Pivot table?

Pivot tables are one of the key features of Excel. They allow a user to view and summarize the entirety of large datasets
simply. Most of the operations with Pivot tables involve drag-and-drop operations that aid in the quick creation of reports.

15: Explain the KNN imputation method, in brief.

KNN is the method that requires the selection of several nearest neighbors and a distance metric at the same time. It can
predict both discrete and continuous attributes of a dataset.

A distance function is used here to find the similarity of two or more attributes, which will help in further analysis.

16: What are the top Apache frameworks used in a distributed computing environment?

MapReduce and Hadoop are considered to be the top Apache frameworks when the situation calls for working with a huge
dataset in a distributed working environment.

17: What is Hierarchical Clustering?

Hierarchical clustering, or hierarchical cluster analysis, is an algorithm that groups similar objects into common groups called
clusters. The goal is to create a set of clusters, where each cluster is different from the other and, individually, they contain
similar entities.

18: What are the steps involved when working with a Data Analysis project?

Many steps are involved when working end-to-end on a data analysis project. Some of the important steps are as
mentioned below:

• Problem statement

• Data cleaning/preprocessing

• Data exploration

• Modeling

• Data validation
• Implementation

• Verification

19: Can you name some of the statistical methodologies used by Data Analysts?

There are many statistical techniques that are very useful when performing data analysis. Here are some of the important
ones:

• Markov process

• Cluster analysis

• Imputation techniques

• Bayesian methodologies

• Rank statistics

20: What is Time Series Analysis?

Time series analysis, or TSA, is a widely used statistical technique when working with trend analysis and time-series data in
particular. The time-series data involves the presence of the data at particular intervals of time or set periods.

21: Where is Time Series Analysis used?

Since time series analysis (TSA) has a wide scope of usage, it can be used in multiple domains. Here are some of the places
where TSA plays an important role:

• Statistics

• Signal processing

• Econometrics

• Weather forecasting

• Earthquake prediction

• Astronomy

• Applied science

22: What are some of the properties of clustering algorithms?

Any clustering algorithm, when implemented will have the following properties:

• Flat or hierarchical

• Iterative

• Disjunctive

23: What is Collaborative Filtering?

Collaborative filtering is an algorithm used to create recommendation systems mainly considering the behavioral data of a
customer or a user.

For example, when browsing through e-commerce sites, a section called ‘Recommended for you’ is present. This is done
using the browsing history, alongside analyzing the previous purchases and collaborative filtering.

24: Which are the types of Hypothesis Testing used today?

There are many types of hypothesis testing. Some of them are as follows:
• Analysis of variance (ANOVA): Here, the analysis is conducted between the mean values of multiple groups.

• T-test: This form of testing is used when the standard deviation is not known and the sample size is relatively less.

• Chi-square test: This kind of hypothesis testing is used when there is a requirement to find out the level of
association between the categorical variables in a sample.

25: What are some of the data validation methodologies used in Data Analysis?

Many types of data validation techniques are used today. Some of them are:

• Field-level validation: Validation is done across each of the fields to ensure that there are no errors in the data
entered by the user.

• Form-level validation: Here, validation is done when the user completes working with the form but before the
information is saved.

• Data saving validation: This form of validation takes place when the file or the database record is being saved.

• Search criteria validation: This kind of validation is used to check whether valid results are returned when the user
is looking for something.

If you are considering becoming proficient in Data Analytics and earning a certification while doing the same, make sure to
check out Intellipaat’s online Data Analyst Course.

26: What is K-means algorithm?

K-means algorithm clusters data into different sets based on how close the data points are to each other. The number of
clusters is indicated by ‘k’ in the k-means algorithm. It tries to maintain a good amount of separation between each of the
clusters.

However, since it works in an unsupervised nature, the clusters will not have any sort of labels to work with.

27: What is the difference between the concepts of recall and the true positive rate?

Recall and the true positive rate, both are totally identical. Here’s the formula for it:

Recall = (True positive)/(True positive + False negative)

28: What are the ideal situations in which t-test or z-test can be used?

It is a standard practice that a t-test is used when there is a sample size less than 30 and the z-test is considered when the
sample size exceeds 30 in most cases.

29: Why is Naive Bayes called ‘naive’?

It is called naive because it makes a general assumption that all the data present are unequivocally important and
independent of each other. This is not true and won’t hold good in a real-world scenario.

Also Read: 7 Reasons You Should Go for Data Analytics Training

30: What is the simple difference between standardized and unstandardized co-efficients?

In the case of standardized co-efficients, they are interpreted based on their standard deviation values. While the
unstandardized coefficient is measured based on the actual value present in the dataset.

31: How are outliers detected?

Multiple methodologies can be used for detecting outliers, but the two most commonly used methods are as follows:
• Standard deviation method: Here, the value is considered as an outlier if the value is lower or higher than three
standard deviations from the mean value.

• Box plot method: Here, a value is considered to be an outlier if it is lesser or higher than 1.5 times the interquartile
range (IQR)

32: Why is KNN preferred when determining missing numbers in data?

K-Nearest Neighbour (KNN) is preferred here because of the fact that KNN can easily approximate the value to be
determined based on the values closest to it.

33: How can one handle suspicious or missing data in a dataset while performing analysis?

If there are any discrepancies in data, a user can go on to use any of the following methods:

• Creation of a validation report with details about the data in discussion

• Escalating the same to an experienced Data Analyst to look at it and take a call

• Replacing the invalid data with a corresponding valid and up-to-date data

• Using many strategies together to find missing values and using approximation if needed

34: What is the simple difference between Principal Component Analysis (PCA) and Factor Analysis (FA)?

Among many differences, the major difference between PCA and FA lies in the fact that factor analysis is used to specify and
work with the variance between variables, but the aim of PCA is to explain the covariance between the existing components
or variables.

Next up on this top Data Analyst interview questions and answers, let us check out some of the top questions from the
advanced category.

35: How is it beneficial to make use of version control?

There are numerous benefits of using version control as shown below:

• Establishes an easy way to compare files, identify differences, and merge if any changes are done

• Creates an easy way to track the life cycle of an application build, including every stage in it such as development,
production, testing, etc.

• Brings about a good way to establish a collaborative work culture

• Ensures that every version and variant of code is kept safe and secure

Next up on these interview questions for Data Analysts, we have to take a look at the trends regarding this domain.

36: What are the future trends in Data Analysis?

With this question, the interviewer is trying to assess your grip on the subject and your research in the field. Make sure to
state valid facts and respective validation for sources to add positivity to your candidature. Also, try to explain how Artificial
Intelligence is making a huge impact on data analysis and its potential in the same.

37: Why are you applying for the Data Analyst role in our company?

Here, the interviewer is trying to see how well you can convince them regarding your proficiency in the subject, alongside
the need for data analysis at the firm you’ve applied for. It is always an added advantage to know the job description in
detail, along with the compensation and the details of the company.

38: Can you rate yourself on a scale of 1–10 depending on your proficiency in Data Analysis?
With this question, the interviewer is trying to grasp your understanding of the subject, your confidence, and your
spontaneity. The most important thing to note here is that you answer honestly based on your capacity.

39: Has your college degree helped you with Data Analysis in any way?

This is a question that relates to the latest program you completed in college. Do talk about the degree you have obtained,
how it was useful, and how you plan on putting it to full use in the coming days after being recruited in the company.

40: What is your plan after taking up this Data Analyst role?

While answering this question, make sure to keep your explanation concise on how you would bring about a plan that works
with the company set-up and how you would implement the plan, ensuring that it works by performing perforation
validation testing on the same. Do highlight how it can be made better in the coming days with further iteration.

41: What are the disadvantages of Data Analytics?

Compared to the plethora of advantages, there are very few disadvantages when considering Data Analytics. Some of the
disadvantages are listed below:

• Data Analytics can cause a breach in customer privacy and their information such as transactions, purchases, and
subscriptions.

• Some of the tools are complex and require prior training.

• It takes a lot of skills and expertise to select the right analytics tool every time.

42: What skills should a successful Data Analyst possess?

This is a descriptive question that is highly dependent on how analytical your thinking skills are. There are a variety of tools
that a Data Analyst must have expertise in. Programming languages such as Python, R, and SAS, probability, statistics,
regression, correlation, and more are the primary skills that a Data Analyst should possess.

43: Why do you think you are the right fit for this Data Analyst role?

With this question, the interviewer is trying to gauge your understanding of the job description and where you’re coming
from, with respect to your knowledge of Data Analysis. Be sure to answer this in a concise yet detailed manner by explaining
your interests, goals, and visions and how these match with the company’s substructure.

Also read: How to become a Big Data Analyst? A Career Guide

44: Can you please talk about your past Data Analysis work?

This is a very commonly asked question in a data analysis interview. The interviewer will be assessing you for your clarity in
communication, actionable insights from your work experience, your debating skills if questioned on the topics, and how
thoughtful you are in your analytical skills.

45: Can you please explain how you would estimate the number of visitors to the Taj Mahal in November 2019?

This is a classic behavioral question. This is to check your thought process without making use of computers or any sort of
dataset. You can begin your answer using the below template:

‘First, I would gather some data. To start with, I’d like to find out the population of Agra, where the Taj Mahal is located. The
next thing I would take a look at is the number of tourists that came to visit the site during that time. This is followed by the
average length of their stay that can be further analyzed by considering factors such as age, gender, and income, and the
number of vacation days and bank holidays there are in India. I would also go about analyzing any sort of data available from
the local tourist offices.’

46: Do you have any experience working in the same industry as ours before?
This is a very straightforward question. This aims to assess if you have the industry-specific skills that are needed for the
current role. Even if you do not possess all of the skills, make sure to thoroughly explain how you can still make use of the
skills you’ve obtained in the past to benefit the company.

47: Have you earned any sort of certifications to boost your opportunities as a Data Analyst aspirant?

As always, interviewers look for candidates who are serious about advancing their career options by making use of
additional tools like certifications. Certificates are strong proof that you have put in all the efforts to learn new skills, master
them, and put them to use to the best of your capacity. List the certifications, if you have any, and do talk about them in
brief, explaining what all you learned from the program and how it’s been helpful to you so far.

48: What tools do you prefer to use in the various phases of Data Analysis?

This again is a question to check what tools you think are useful for their respective tasks. Do talk about how comfortable
you are with the tools you mention and about their popularity in the market today.

49: Which step of a Data Analysis project do you like the most?

Do know that it is completely normal to have a predilection toward certain tools and tasks over others. However, while
performing data analysis, you will always be expected to deal with the entirety of the analytics life cycle, so make sure not to
speak negatively about any of the tools or of the steps in the process of data analysis.

Finally, in this interview questions for the Data Analysts blog, we have to understand how to carefully approach this
question and answer it to the best of our ability.

50: How good are you in terms of explaining technical content to a non-technical audience with respect to Data Analysis?

This is another classic question asked in most of the Data Analytics interviews. Here, it is extremely vital that you talk about
your communication skills in terms of delivering the technical content, your level of patience, and your ability to break
content into smaller chunks to help the audience understand better.

It is always advantageous to show the interviewer that you are very well capable of working effectively with people from a
variety of backgrounds who may or may not be technical.

If you are looking forward to learning and mastering all of the Data Analytics and Data Science concepts and earning a
certification in the same, do take a look at Intellipaat’s latest Data Science with R Certification offerings.

1. Mention the differences between Data Mining and Data Profiling?

Data Mining Data Profiting

Data profiling is done to


evaluate a dataset for its
Data mining is the process of discovering relevant
uniqueness, logic, and
information that has not yet been identified
consistency.
before.
It cannot identify inaccurate or
incorrect data values.
In data mining, raw data is converted into valuable
information.

2. Define the term 'Data Wrangling in Data Analytics.


Data Wrangling is the process wherein raw data is cleaned, structured, and enriched into a desired usable format for better
decision making. It involves discovering, structuring, cleaning, enriching, validating, and analyzing data. This process can turn
and map out large amounts of data extracted from various sources into a more useful format. Techniques such as merging,
grouping, concatenating, joining, and sorting are used to analyze the data. Thereafter it gets ready to be used with another
dataset.

3. What are the various steps involved in any analytics project?


This is one of the most basic data analyst interview questions. The various steps involved in any common analytics projects
are as follows:

Understanding the Problem

Understand the business problem, define the organizational goals, and plan for a lucrative solution.

Collecting Data

Gather the right data from various sources and other information based on your priorities.

Cleaning Data

Clean the data to remove unwanted, redundant, and missing values, and make it ready for analysis.

Exploring and Analyzing Data

Use data visualization and business intelligence tools, data mining techniques, and predictive modeling to analyze data.

Interpreting the Results

Interpret the results to find out hidden patterns, future trends, and gain insights

4. What are the common problems that data analysts encounter during analysis?
The common problems steps involved in any analytics project are:

• Handling duplicate

• Collecting the meaningful right data and the right time

• Handling data purging and storage problems

• Making data secure and dealing with compliance issues


5. Which are the technical tools that you have used for analysis and presentation purposes?
As a data analyst, you are expected to know the tools mentioned below for analysis and presentation purposes. Some of the
popular tools you should know are:

MS SQL Server, MySQL

For working with data stored in relational databases

MS Excel, Tableau

For creating reports and dashboards

Python, R, SPSS

For statistical analysis, data modeling, and exploratory analysis

MS PowerPoint

For presentation, displaying the final results and important conclusions

6. What are the best methods for data cleaning?


• Create a data cleaning plan by understanding where the common errors take place and keep all the communications
open.

• Before working with the data, identify and remove the duplicates. This will lead to an easy and effective data analysis
process.

• Focus on the accuracy of the data. Set cross-field validation, maintain the value types of data, and provide mandatory
constraints.

• Normalize the data at the entry point so that it is less chaotic. You will be able to ensure that all information is
standardized, leading to fewer errors on entry.

7. What is the significance of Exploratory Data Analysis (EDA)?

• Exploratory data analysis (EDA) helps to understand the data better.

• It helps you obtain confidence in your data to a point where you’re ready to engage a machine learning algorithm.

• It allows you to refine your selection of feature variables that will be used later for model building.

• You can discover hidden trends and insights from the data.
8. Explain descriptive, predictive, and prescriptive analytics.

Descriptive Predictive Prescriptive

Suggest various courses


of action to answer
It provides insights into Understands the future to
“what should you do”
the past to answer “what answer “what could happen”
has happened”

Uses simulation
algorithms and
optimization techniques
Uses data aggregation and Uses statistical models and to advise possible
data mining techniques forecasting techniques outcomes

Example: An ice cream


company can analyze how
Example: An ice cream Example: Lower prices
much ice cream was sold,
company can analyze how to increase the sale of
which flavors were sold,
much ice cream was sold, ice creams, produce
and whether more or less
which flavors were sold, and more/fewer quantities
ice cream was sold than
whether more or less ice cream of a specific flavor of ice
the day before
was sold than the day before cream
9. What are the different types of sampling techniques used by data analysts?

Sampling is a statistical method to select a subset of data from an entire dataset (population) to estimate the characteristics
of the whole population.

There are majorly five types of sampling methods:

• Simple random sampling

• Systematic sampling

• Cluster sampling

• Stratified sampling

• Judgmental or purposive sampling

10. Describe univariate, bivariate, and multivariate analysis.


Univariate analysis is the simplest and easiest form of data analysis where the data being analyzed contains only one
variable.

Example - Studying the heights of players in the NBA.

Univariate analysis can be described using Central Tendency, Dispersion, Quartiles, Bar charts, Histograms, Pie charts, and
Frequency distribution tables.

The bivariate analysis involves the analysis of two variables to find causes, relationships, and correlations between the
variables.

Example – Analyzing the sale of ice creams based on the temperature outside.

The bivariate analysis can be explained using Correlation coefficients, Linear regression, Logistic regression, Scatter plots,
and Box plots.

The multivariate analysis involves the analysis of three or more variables to understand the relationship of each variable
with the other variables.

Example – Analysing Revenue based on expenditure.

Multivariate analysis can be performed using Multiple regression, Factor analysis, Classification & regression trees, Cluster
analysis, Principal component analysis, Dual-axis charts, etc

11. How can you handle missing values in a dataset?


This is one of the most frequently asked data analyst interview questions, and the interviewer expects you to give a detailed
answer here, and not just the name of the methods. There are four methods to handle missing values in a dataset.

Listwise Deletion

In the listwise deletion method, an entire record is excluded from analysis if any single value is missing.

Average Imputation

Take the average value of the other participants' responses and fill in the missing value.

Regression Substitution

You can use multiple-regression analyses to estimate a missing value.


Multiple Imputations

It creates plausible values based on the correlations for the missing data and then averages the simulated datasets by
incorporating random errors in your predictions.

12. Explain the term Normal Distribution.

Normal Distribution refers to a continuous probability distribution that is symmetric about the mean. In a graph, normal
distribution will appear as a bell curve.

• The mean, median, and mode are equal

• All of them are located in the center of the distribution

• 68% of the data falls within one standard deviation of the mean

• 95% of the data lies between two standard deviations of the mean

• 99.7% of the data lies between three standard deviations of the mean

13. What is Time Series analysis?

Time Series analysis is a statistical procedure that deals with the ordered sequence of values of a variable at equally spaced
time intervals. Time series data are collected at adjacent periods. So, there is a correlation between the observations. This
feature distinguishes time-series data from cross-sectional data.

Below is an example of time-series data on coronavirus cases and its graph.


14. How is Overfitting different from Underfitting?

This is another frequently asked data analyst interview question, and you are expected to cover all the given differences!

Overfitting Underfitting

Here, the model neither trains


the data well nor can
The model trains the data well using the training
generalize to new data.
set.

The performance drops considerably over the test Performs poorly both on the
set. train and the test set.

This happens when there is


lesser data to build an accurate
model and when we try to
Happens when the model learns the random develop a linear model using
fluctuations and noise in the training dataset in non-linear data.
detail.
15. How do you treat outliers in a dataset?
An outlier is a data point that is distant from other similar points. They may be due to variability in the measurement or may
indicate experimental errors.

The graph depicted below shows there are three outliers in the dataset.

To deal with outliers, you can use the following four methods:

• Drop the outlier records

• Cap your outliers data

• Assign a new value

• Try a new transformation

17. What are the different types of Hypothesis testing?

Hypothesis testing is the procedure used by statisticians and scientists to accept or reject statistical hypotheses. There are
mainly two types of hypothesis testing:

• Null hypothesis: It states that there is no relation between the predictor and outcome variables in the population. H0
denoted it.

Example: There is no association between a patient’s BMI and diabetes.

• Alternative hypothesis: It states that there is some relation between the predictor and outcome variables in the
population. It is denoted by H1.

Example: There could be an association between a patient’s BMI and diabetes.

18. Explain the Type I and Type II errors in Statistics?

In Hypothesis testing, a Type I error occurs when the null hypothesis is rejected even if it is true. It is also known as a false
positive.

A Type II error occurs when the null hypothesis is not rejected, even if it is false. It is also known as a false negative.
Excel Data Analyst Interview Questions

19. In Microsoft Excel, a numeric value can be treated as a text value if it precedes with what?

20. What is the difference between COUNT, COUNTA, COUNTBLANK, and COUNTIF in Excel?

• COUNT function returns the count of numeric cells in a range

• COUNTA function counts the non-blank cells in a range

• COUNTBLANK function gives the count of blank cells in a range

• COUNTIF function returns the count of values by checking a given condition

21. How do you make a dropdown list in MS Excel?

• First, click on the Data tab that is present in the ribbon.

• Under the Data Tools group, select Data Validation.

• Then navigate to Settings > Allow > List.

• Select the source you want to provide as a list array.

22. Can you provide a dynamic range in “Data Source” for a Pivot table?

Yes, you can provide a dynamic range in the “Data Source” of Pivot tables. To do that, you need to create a named range
using the offset function and base the pivot table using a named range constructed in the first step.

17. What is the function to find the day of the week for a particular date value?

The get the day of the week, you can use the WEEKDAY() function.

The above function will return 6 as the result, i.e., 17th December is a Saturday.
18. How does the AND() function work in Excel?

AND() is a logical function that checks multiple conditions and returns TRUE or FALSE based on whether the conditions are
met.

Syntax: AND(logica1,[logical2],[logical3].)

In the below example, we are checking if the marks are greater than 45. The result will be true if the mark is

>45, else it will be false.

19. Explain how VLOOKUP works in Excel?

VLOOKUP is used when you need to find things in a table or a range by row.

VLOOKUP accepts the following four parameters:

lookup_value - The value to look for in the first column of a table table - The table from where you can extract value

col_index - The column from which to extract value

range_lookup - [optional] TRUE = approximate match (default). FALSE = exact match Let’s understand VLOOKUP with an
example.

If you wanted to find the department to which Stuart belongs to, you could use the VLOOKUP function as shown below:
Here, A11 cell has the lookup value, A2:E7 is the table array, 3 is the column index number with information about
departments, and 0 is the range lookup.

If you hit enter, it will return “Marketing”, indicating that Stuart is from the marketing department

17. What function would you use to get the current date and time in Excel?

In Excel, you can use the TODAY() and NOW() function to get the current date and time.
18. Using the below sales table, calculate the total quantity sold by sales representatives whose name starts with A,
and the cost of each item they have sold is greater than 10.

You can use the SUMIFS() function to find the total quantity.

For the Sales Rep column, you need to give the criteria as “A*” - meaning the name should start with the letter “A”. For the
Cost each column, the criteria should be “>10” - meaning the cost of each item is greater than 10.

The result is 13.


19. Using the data given below, create a pivot table to find the total sales made by each sales representative for each
item. Display the sales as % of the grand total.

• Select the entire table range, click on the Insert tab and choose PivotTable

• Select the table range and the worksheet where you want to place the pivot table

• Drag Sale total on to Values, and Sales Rep and Item on to Row Labels. It will give the sum of sales made by each
representative for every item they have sold.
• Right-click on “Sum of Sale Total’ and expand Show Values As to select % of Grand Total.

• Below is the resultant pivot table.

1) What is Python?

Python was created by Guido van Rossum, and released in 1991.


It is a general-purpose computer programming language. It is a high-level, object-oriented language which can run equally
on different platforms such as

Windows, Linux, UNIX, and Macintosh. Its high-level built-in data structures, combined with dynamic typing and dynamic
binding. It is widely used in data science, machine learning and artificial intelligence domain.

It is easy to learn and require less code to develop the applications. It is widely used for:

o Web development (server-side).

o Software development.

o Mathematics.

o System scripting.

2) Why Python?

o Python is an interpreted, object-oriented, high-level programming language with dynamic semantics.

o Python is compatible with different platforms like Windows, Mac, Linux, Raspberry Pi, etc.

o Python has a simple syntax as compared to other languages.

o Python allows a developer to write programs with fewer lines than some other programming languages.

o Python runs on an interpreter system, means that the code can be executed as soon as it is written. It helps to
provide a prototype very quickly.

o Python can be described as a procedural way, an object-orientated way or a functional way.

o The Python interpreter and the extensive standard library are available in source or binary form without charge for
all major platforms, and can be freely distributed.

3) What are the applications of Python?

Python is used in various software domains some application areas are given below.

o Web and Internet Developmen

o Games

o Scientific and computational applications

o Language development

o Image processing and graphic design applications

o Enterprise and business applications development

o Operating systems

o GUI based desktop applications

Python provides various web frameworks to develop web applications. The popular python web frameworks are Django,
Pyramid, Flask.

Python's standard library supports for E-mail processing, FTP, IMAP, and other Internet protocols. Python's SciPy and NumPy
helps in scientific and computational application development.

Python's Tkinter library supports to create a desktop based GUI applications.

4) What are the advantages of Python?


Advantages of Python are:

o Python is Interpreted language

Interpreted: Python is an interpreted language. It does not require prior compilation of code and executes instructions
directly.

o It is Free and open source

Free and open source: It is an open-source project which is publicly available to reuse. It can be downloaded free of cost.

o It is Extensible

Extensible: It is very flexible and extensible with any module.

o Object-oriented

Object-oriented: Python allows to implement the Object-Oriented concepts to build application solution.

o It has Built-in data structure

Built-in data structure: Tuple, List, and Dictionary are useful integrated data structures provided by the language.

o Readability

o High-Level Language

o Cross-platform

Portable: Python programs can run on cross platforms without affecting its performance.

5) What is PEP 8?

PEP 8 stands for Python Enhancement Proposal, it can be defined as a document that helps us to provide the guidelines on
how to write the Python code. It is basically a set of rules that specify how to format Python code for maximum readability.
It was written by Guido van Rossum, Barry Warsaw and Nick Coghlan in 2001.

6) What do you mean by Python literals?

Literals can be defined as a data which is given in a variable or constant. Python supports the following literals:

String Literals

String literals are formed by enclosing text in the single or double quotes. For example, string literals are string values.

Example:

1. # in single quotes

2. single 'AdnanManna'

3. # in double quotes

4. double = " AdnanManna"

5. # multi-line String

6. multi = '''''Adnan
7. _

8. Manna'''

9.

10. print(single)

11. print(double)

12. print(multi) Output:

Numeric Literals

Python supports three types of numeric literals integer, float and complex.

Example:

1. # Integer literal 2. a = 10

3. #Float Literal 4. b = 12.3

5. #Complex Literal 6. x = 3.14j

7. print(a)

8. print(b)

9. print(x) Output:

Boolean Literals

Boolean literals are used to denote Boolean values. It contains either True or False.

Example:

1. p = (1 == True)

2. q = (1 == False)

3. r = True + 3

4. s = False + 7 5.

6. print("p is", p)

7. print("q is", q)

8. print("r:", r)

9. print("s:", s)

Output:

Special literals

Python contains one special literal, that is, 'None'. This special literal is used for defining a null variable. If 'None' is
compared with anything else other than a 'None', it will return false.

Example:

1. word = None
2. print(word) Output:

7) Explain Python Functions?

A function is a section of the program or a block of code that is written once and can be executed whenever required in the
program. A function is a block of self-contained statements which has a valid name, parameters list, and body. Functions
make programming more functional and modular to perform modular tasks. Python provides several built-in functions to
complete tasks and also allows a user to create new functions as well.

There are three types of functions:

o Built-In Functions: copy(), len(), count() are the some built-in functions.

o User-defined Functions: Functions which are defined by a user known as user-defined functions.

o Anonymous functions: These functions are also known as lambda functions because they are not declared with the
standard def keyword.

Example: A general syntax of user defined function is given below.

1. def function_name(parameters list):

2. #--- statements---

3. return a_value

8) What is zip() function in Python?

Python zip() function returns a zip object, which maps a similar index of multiple containers. It takes an iterable, convert into
iterator and aggregates the elements based on iterables passed. It returns an iterator of tuples.

Signature

1. zip(iterator1, iterator2, iterator3 ...)

Parameters

iterator1, iterator2, iterator3: These are iterator objects that are joined together.

Return

It returns an iterator from two or more iterators.

9) What is Python's parameter passing mechanism?

There are two parameters passing mechanism in Python:

o Pass by references

o Pass by value

By default, all the parameters (arguments) are passed "by reference" to the functions. Thus, if you change the value of the
parameter within a function, the change is reflected in the calling function as

well. It indicates the original variable. For example, if a variable is declared as a = 10, and passed to a function where it's
value is modified to a = 20. Both the variables denote to the same value.
The pass by value is that whenever we pass the arguments to the function only values pass to the function, no reference
passes to the function. It makes it immutable that means not changeable. Both variables hold the different values, and
original value persists even after modifying in the function.

Python has a default argument concept which helps to call a method using an arbitrary number of arguments.

10) How to overload constructors or methods in Python?

Python's constructor: _init () is the first method of a class. Whenever we try to instantiate an object

init () is automatically invoked by python to initialize members of an object. We can't overload constructors or methods in
Python. It shows an error if we try to overload.

Example:

1. class student:

2. def init (self, name):

3. [Link] = name

4. def init (self, name, email):

5. [Link] = name

6. [Link] = email 7.

8. # This line will generate an error

9. #st = student("rahul") 10.

11. # This line will call the second constructor

12. st = student("rahul", "rahul@[Link]")

13. print("Name: ", [Link])

14. print("Email id: ", [Link])

Output:

11) What is the difference between remove() function and del statement?

The user can use the remove() function to delete a specific object in the list.

Example:

1. list_1 = [ 3, 5, 7, 3, 9, 3 ]

2. print(list_1)

3. list_1.remove(3)

4. print("After removal: ", list_1)

Output:

If you want to delete an object at a specific location (index) in the list, you can either use del or pop. Example:

1. list_1 = [ 3, 5, 7, 3, 9, 3 ]

2. print(list_1)

3. del list_1[2]
4. print("After deleting: ", list_1)

Output:

We cannot use these methods with a tuple because the tuple is different from the list

12) What is swapcase() function in the Python?

It is a string's function which converts all uppercase characters into lowercase and vice versa. It is used to alter the existing
case of the string. This method creates a copy of the string which contains all the characters in the swap case. If the string is
in lowercase, it generates a small case string and vice versa. It automatically ignores all the non-alphabetic characters. See
an example below.

Example:

1. string = "IT IS IN LOWERCASE."

2. print([Link]()) 3.

4. string = "it is in uppercase."

5. print([Link]()) Output:

13) How to remove whitespaces from a string in Python?

To remove the whitespaces and trailing spaces from the string, Python providies strip([str]) built-in function. This function
returns a copy of the string after removing whitespaces if present. Otherwise returns original string.

Example:

1. string = " AdnanManna "

2. string2 = " AdnanManna "

3. string3 = " AdnanManna "

4. print(string)

5. print(string2)

6. print(string3)

7. print("After stripping all have placed in a sequence:")

8. print([Link]())

9. print([Link]())

10. print([Link]()) Output:

14) How to remove leading whitespaces from a string in the Python?

To remove leading characters from a string, we can use lstrip() function. It is Python string function which takes an optional
char type parameter. If a parameter is provided, it removes the character. Otherwise, it removes all the leading spaces from
the string.

Example:

1. string = " AdnanManna "

2. string2 = " AdnanManna "

3. print(string)
4. print(string2)

5. print("After stripping all leading whitespaces:")

6. print([Link]())

7. print([Link]()) Output:

After stripping, all the whitespaces are removed, and now the string looks like the below:

15) Why do we use join() function in Python?

The join() is defined as a string method which returns a string value. It is concatenated with the elements of an iterable. It
provides a flexible way to concatenate the strings. See an example below.

Example:

1. str = "Rohan"

2. str2 = "ab"

3. # Calling function

4. str2 = [Link](str2)

5. # Displaying result

6. print(str2) Output:

16) Give an example of shuffle() method?

This method shuffles the given string or an array. It randomizes the items in the array. This method is present in the random
module. So, we need to import it and then we can call the function. It shuffles elements each time when the function calls
and produces different output.

Example:

1. # import the random module

2. import random

3. # declare a list

4. sample_list1 = ['Z', 'Y', 'X', 'W', 'V', 'U']

5. print("Original LIST1: ")

6. print(sample_list1)

7. # first shuffle

8. [Link](sample_list1)

9. print("\nAfter the first shuffle of LIST1: ")

10. print(sample_list1)

11. # second shuffle

12. [Link](sample_list1)
13. print("\nAfter the second shuffle of LIST1: ")

14. print(sample_list1) Output:

17) What is the use of break statement?

The break statement is used to terminate the execution of the current loop. Break always breaks the current execution and
transfer control to outside the current block. If the block is in a loop, it exits from the loop, and if the break is in a nested
loop, it exits from the innermost loop.

Example:

1. list_1 = ['X', 'Y', 'Z']

2. list_2 = [11, 22, 33]

3. for i in list_1:

4. for j in list_2:

5. print(i, j)

6. if i == 'Y' and j == 33:

7. print('BREAK')

8. break

9. else:

10. continue

11. break Output:

Python Break statement flowchart.

18) What is tuple in Python?

A tuple is a built-in data collection type. It allows us to store values in a sequence. It is immutable, so no change is reflected
in the original data. It uses () brackets rather than [] square brackets to create a tuple. We cannot remove any element but
can find in the tuple. We can use indexing to get elements. It also allows traversing elements in reverse order by using
negative indexing. Tuple supports various methods like max(), sum(), sorted(), Len() etc.

To create a tuple, we can declare it as below.

Example:

1. # Declaring tuple 2. tup = (2,4,6,8)

3. # Displaying value

4. print(tup) 5.

6. # Displaying Single value

7. print(tup[2]) Output:

It is immutable. So updating tuple will lead to an error.


Example:

1. # Declaring tuple 2. tup = (2,4,6,8)

3 # Displaying value

4. print(tup) 5.

6. # Displaying Single value

7. print(tup[2]) 8.

9. # Updating by assigning new value 10. tup[2]=22

11. # Displaying Single value

12. print(tup[2]) Output:

19) Which are the file related libraries/modules in Python?

The Python provides libraries/modules that enable you to manipulate text files and binary files on the file system. It helps to
create files, update their contents, copy, and delete files. The libraries are os, [Link], and shutil.

Here, os and [Link] - modules include a function for accessing the filesystem while shutil - module enables you to copy and
delete the files.

20) What are the different file processing modes supported by Python?

Python provides four modes to open files. The read-only (r), write-only (w), read-write (rw) and append mode (a). 'r' is used
to open a file in read-only mode, 'w' is used to open a file in write-only mode, 'rw' is used to open in reading and write
mode, 'a' is used to open a file in append mode. If the mode is not specified, by default file opens in read-only mode.

o Read-only mode (r): Open a file for reading. It is the default mode.

o Write-only mode (w): Open a file for writing. If the file contains data, data would be lost. Other a new file is created.

o Read-Write mode (rw): Open a file for reading, write mode. It means updating mode.

o Append mode (a): Open for writing, append to the end of the file, if the file exists.

21) What is an operator in Python?

An operator is a particular symbol which is used on some values and produces an output as a result. An operator works on
operands. Operands are numeric literals or variables which hold some values.

Operators can be unary, binary or ternary. An operator which requires a single operand known as a unary operator, which
require two operands known as a binary operator and which require three operands is called ternary operator.

Example:

1. # Unary Operator 2. A = 12

3. B = -(A)

4. print (B)

5. # Binary Operator 6. A = 12

7. B = 13

8. print (A + B)

9. print (B * A)
10. #Ternary Operator 11. A = 12

12. B = 13

13. min = A if A < B else B 14.

15. print(min) Output:

22) What are the different types of operators in Python?

Python uses a rich set of operators to perform a variety of operations. Some individual operators like membership and
identity operators are not so familiar but allow to perform operations.

o Arithmetic OperatorsRelational Operators

o Assignment Operators

o Logical Operators

o Membership Operators

o Identity Operators

o Bitwise Operators

Arithmetic operators perform basic arithmetic operations. For example "+" is used to add and "?" is used for subtraction.

Example:

1. # Adding two values 2. print(12+23)

3. # Subtracting two values 4. print(12-23)

5. # Multiplying two values

6. print(12*23)

7. # Dividing two values 8. print(12/23)

Output:

Relational Operators are used to comparing the values. These operators test the conditions and then returns a boolean
value either True or False.

# Examples of Relational Operators

Example:

1. a, b = 10, 12

2. print(a==b) # False

3. print(a<b) # True

4. print(a<=b) # True

5. print(a!=b) # True

Output:

Assignment operators are used to assigning values to the variables. See the examples below.

Example:

1. # Examples of Assignment operators 2. a=12


3. print(a) # 12

4. a += 2

5. print(a) # 14 6. a -= 2

7. print(a) # 12

8. a *=2

9. print(a) # 24

10. a **=2

11. print(a) # 576

Output:

Logical operators are used to performing logical operations like And, Or, and Not. See the example below.

Example:

1. # Logical operator examples

2. a = True

3. b = False

4. print(a and b) # False

5. print(a or b) # True

6. print(not b) # True

Output:

Membership operators are used to checking whether an element is a member of the sequence (list, dictionary, tuples) or
not. Python uses two membership operators in and not in operators to check element presence. See an example.

Example:

1. # Membership operators examples 2. list = [2,4,6,7,3,4]

3. print(5 in list) # False

4. cities = ("india","delhi")

5. print("tokyo" not in cities) #True

Output:

Identity Operators (is and is not) both are used to check two values or variable which are located on the same part of the
memory. Two variables that are equal does not imply that they are identical. See the following examples.

Example:

1. # Identity operator example 2. a = 10

3. b = 12

4. print(a is b) # False

5. print(a is not b) # True


Output:

Bitwise Operators are used to performing operations over the bits. The binary operators (&, |, OR) work on bits. See the
example below.

Example:

1. # Identity operator example 2. a = 10

3. b = 12

4. print(a & b) # 8

5. print(a | b) # 14

6. print(a ^ b) # 6

7. print(~a) # -11

Output:

23) How to create a Unicode string in Python?

In Python 3, the old Unicode type has replaced by "str" type, and the string is treated as Unicode by default. We can make a
string in Unicode by using [Link]("utf-8") function.

Example:

1. unicode_1 = ("\u0123", "\u2665", "\U0001f638", "\u265E", "\u265F", "\u2168")

2. print (unicode_1)

Output:

24) is Python interpreted language?

Python is an interpreted language. The Python language program runs directly from the source code. It converts the source
code into an intermediate language code, which is again translated into machine language that has to be executed.

Unlike Java or C, Python does not require compilation before execution.

25) How is memory managed in Python?

Memory is managed in Python in the following ways:

o Memory management in python is managed by Python private heap space. All Python objects and data structures
are located in a private heap. The programmer does not have access to this private heap. The python interpreter takes care
of this instead.

o The allocation of heap space for Python objects is done by Python's memory manager. The core API gives access to
some tools for the programmer to code.

o Python also has an inbuilt garbage collector, which recycles all the unused memory and so that it can be made
available to the heap space.

26) What is the Python decorator?


Decorators are very powerful and a useful tool in Python that allows the programmers to add functionality to an existing
code. This is also called metaprogramming because a part of the program tries to modify another part of the program at
compile time. It allows the user to wrap another function to extend the behaviour of the wrapped function, without
permanently modifying it.

Example:

1. def function_is_called():

2. def function_is_returned():

3. print("AdnanManna")

4. return function_is_returned

5. new_1 = function_is_called()

6. # Outputs "AdnanManna"

7. new_1()

Output:

Functions vs. Decorators

A function is a block of code that performs a specific task whereas a decorator is a function that modifies other functions.

27) What are the rules for a local and global variable in Python?

Global Variables:

o Variables declared outside a function or in global space are called global variables.

o If a variable is ever assigned a new value inside the function, the variable is implicitly local, and we need to declare it
as 'global' explicitly. To make a variable globally, we need to declare it by using global keyword.

o Global variables are accessible anywhere in the program, and any function can access and modify its value.

Example:

1. A = "AdnanManna"

2. def my_function():

3. print(A)

4. my_function()

Output:

Local Variables:

o Any variable declared inside a function is known as a local variable. This variable is present in the local space and not
in the global space.

o If a variable is assigned a new value anywhere within the function's body, it's assumed to be a local.

o Local variables are accessible within local body only.


Example:

1. def my_function2():

2. K = " AdnanManna Local"

3. print(K)

4. my_function2()

Output:

28) What is the namespace in Python?

The namespace is a fundamental idea to structure and organize the code that is more useful in large projects. However, it
could be a bit difficult concept to grasp if you're new to programming. Hence, we tried to make namespaces just a little
easier to understand.

A namespace is defined as a simple system to control the names in a program. It ensures that names are unique and won't
lead to any conflict.

Also, Python implements namespaces in the form of dictionaries and maintains name-to-object mapping where names act
as keys and the objects as values.

29) What are iterators in Python?

In Python, iterators are used to iterate a group of elements, containers like a list. Iterators are the collection of items, and it
can be a list, tuple, or a dictionary. Python iterator implements itr and next() method to iterate the stored elements. In
Python, we generally use loops to iterate over the collections (list, tuple).

In simple words: Iterators are objects which can be traversed though or iterated upon.

30) What is a generator in Python?

In Python, the generator is a way that specifies how to implement iterators. It is a normal function except that it yields
expression in the function. It does not implements itr and next() method and reduce other overheads as well.

If a function contains at least a yield statement, it becomes a generator. The yield keyword pauses the current execution by
saving its states and then resume from the same when required.

31) What is slicing in Python?

Slicing is a mechanism used to select a range of items from sequence type like list, tuple, and string. It is beneficial and easy
to get elements from a range by using slice way. It requires a : (colon) which separates the start and end index of the field.
All the data collection types List or tuple allows us to use slicing to fetch elements. Although we can get elements by
specifying an index, we get only single element whereas using slicing we can get a group of elements.

Example:

1. Q = "AdnanManna, Python Interview Questions!" 2. print(Q[2:25])

Output:

32) What is a dictionary in Python?

The Python dictionary is a built-in data type. It defines a one-to-one relationship between keys and values. Dictionaries
contain a pair of keys and their corresponding values. It stores elements in key and value pairs. The keys are unique whereas
values can be duplicate. The key accesses the dictionary elements.

Keys index dictionaries.

Example:
The following example contains some keys Country Hero & Cartoon. Their corresponding values are India, Modi, and Rahul
respectively.

1. dict = {'Country': 'India', 'Hero': 'Modi', 'Cartoon': 'Rahul'}

2. print ("Country: ", dict['Country'])

3. print ("Hero: ", dict['Hero'])

4. print ("Cartoon: ", dict['Cartoon'])

Output:

33) What is Pass in Python?

Pass specifies a Python statement without operations. It is a placeholder in a compound statement. If we want to create an
empty class or functions, the pass keyword helps to pass the control without error.

Example:

1. class Student:

2. pass # Passing class

3. class Student:

4. def info():

5. pass # Passing function

34) Explain docstring in Python?

The Python docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. It
provides a convenient way to associate the documentation.

String literals occurring immediately after a simple assignment at the top are called "attribute docstrings".

String literals occurring immediately after another docstring are called "additional docstrings". Python uses triple quotes to
create docstrings even though the string fits on one line.

Docstring phrase ends with a period (.) and can be multiple lines. It may consist of spaces and other special chars.

Example:

1. # One-line docstrings

2. def hello():

3. """A function to greet."""

4. return "hello"

35) What is a negative index in Python and why are they used?

The sequences in Python are indexed and it consists of the positive as well as negative numbers. The numbers that are
positive uses '0' that is uses as first index and '1' as the second index and the process go on like that.

The index for the negative number starts from '-1' that represents the last index in the sequence and '- 2' as the penultimate
index and the sequence carries forward like the positive number.

The negative index is used to remove any new-line spaces from the string and allow the string to except the last character
that is given as S[:-1]. The negative index is also used to show the index to represent the string in correct order.
36) What is pickling and unpickling in Python?

The Python pickle is defined as a module which accepts any Python object and converts it into a string representation. It
dumps the Python object into a file using the dump function; this process is called Pickling.

The process of retrieving the original Python objects from the stored string representation is called as Unpickling.

Java and Python both are object-oriented programming languages. Let's compare both on some criteria given below:

Criteria Java Python

Ease of use Good Very Good

Coding Speed Average Excellent

Data types Static type Dynamic type

Data Science and Machine learning application Average Very Good

38) What is the usage of help() and dir() function in Python?

Help() and dir() both functions are accessible from the Python interpreter and used for viewing a consolidated dump of
built-in functions.

Help() function: The help() function is used to display the documentation string and also facilitates us to see the help related
to modules, keywords, and attributes.

Dir() function: The dir() function is used to display the defined symbols.

39) What are the differences between Python 2.x and Python 3.x?

Python 2.x is an older version of Python. Python 3.x is newer and latest version. Python 2.x is legacy now. Python 3.x is the
present and future of this language.

The most visible difference between Python2 and Python3 is in print statement (function). In Python 2, it looks like print
"Hello", and in Python 3, it is print ("Hello").

String in Python2 is ASCII implicitly, and in Python3 it is Unicode.

The xrange() method has removed from Python 3 version. A new keyword as is introduced in Error handling.

40) How Python does Compile-time and Run-time code checking?

In Python, some amount of coding is done at compile time, but most of the checking such as type, name, etc. are postponed
until code execution. Consequently, if the Python code references a user- defined function that does not exist, the code will
compile successfully. The Python code will fail only with an exception when the code execution path does not exist.

41) What is the shortest method to open a text file and display its content?

The shortest way to open a text file is by using "with" command in the following manner:

Example:

1. with open("FILE NAME", "r") as fp:

2. fileData = [Link]()

3. # To print the contents of the file


4. print(fileData) Output:

42) What is the usage of enumerate () function in Python?

The enumerate() function is used to iterate through the sequence and retrieve the index position and its corresponding
value at the same time.

Example:

1. list_1 = ["A","B","C"]

2. s_1 = "Javatpoint"

3. # creating enumerate objects

4. object_1 = enumerate(list_1)

5. object_2 = enumerate(s_1) 6.

7. print ("Return type:",type(object_1))

8. print (list(enumerate(list_1)))

9. print (list(enumerate(s_1)))

Output:

43) Give the output of this example: A[3] if A=[1,4,6,7,9,66,4,94].

Since indexing starts from zero, an element present at 3rd index is 7. So, the output is 7.

44) What is type conversion in Python?

Type conversion refers to the conversion of one data type iinto another.

int() - converts any data type into integer type float() - converts any data type into float type ord() - converts characters into
integer

hex() - converts integers to hexadecimal

oct() - converts integer to octal

tuple() - This function is used to convert to a tuple.

set() - This function returns the type after converting to set.

list() - This function is used to convert any data type to a list type.

dict() - This function is used to convert a tuple of order (key,value) into a dictionary.

str() - Used to convert integer into a string.

complex(real,imag) - This functionconverts real numbers to complex(real,imag) number.

45) How to send an email in Python Language?

To send an email, Python provides smtplib and email modules. Import these modules into the created mail script and send
mail by authenticating a user.

It has a method SMTP(smtp-server, port). It requires two parameters to establish SMTP connection.

A simple example to send an email is given below.

Example:
1. import smtplib

2. # Calling SMTP

3. s = [Link]('[Link]', 587)

4. # TLS for network security

5. [Link]()

6. # User email Authentication

7. [Link]("sender@email_id", "sender_email_id_password")

8. # Message to be sent

9. message = "Message_sender_need_to_send"

10. # Sending the mail

11. [Link]("sender@email_id ", "receiver@email_id", message)

46) What is the difference between Python Arrays and lists?

Arrays and lists, in Python, have the same way of storing data. But, arrays can hold only a single data type elements whereas
lists can hold any data type elements.

Example:

1. import array as arr

2. User_Array = [Link]('i', [1,2,3,4])

3. User_list = [1, 'abc', 1.20]

4. print (User_Array)

5. print (User_list)

Output

47) What is lambda function in Python?

The anonymous function in python is a function that is defined without a name. The normal functions are defined using a
keyword "def", whereas, the anonymous functions are defined using the lambda function. The anonymous functions are
also called as lambda functions.

48) Why do lambda forms in Python not have the statements?

Lambda forms in Python does not have the statement because it is used to make the new function object and return them
in runtime.

49) What are functions in Python?

A function is a block of code which is executed only when it is called. To define a Python function, the def keyword is used.

Example:

1. def New_func():

2. print ("Hi, Welcome to JavaTpoint")


3. New_func() #calling the function

Output:

50) What is init ?

The init is a method or constructor in Python. This method is automatically called to allocate

memory when a new object/ instance of a class is created. All classes have the init method.

Example:

1. class Employee_1:

2. def init (self, name, age,salary):

3. [Link] = name

4. [Link] = age

5. [Link] = 20000

6. E_1 = Employee_1("pqr", 20, 25000)

7. # E1 is the instance of class Employee.

8. # init allocates memory for E1.

9. print(E_1.name)

10. print(E_1.age)

11. print(E_1.salary) Output:

51) What is self in Python?

Self is an instance or an object of a class. In Python, this is explicitly included as the first parameter. However, this is not the
case in Java where it's optional. It helps to differentiate between the methods and attributes of a class with local variables.

The self-variable in the init method refers to the newly created object while in other methods, it refers to the object whose
method was called.

52) How can you generate random numbers in Python?

Random module is the standard module that is used to generate a random number. The method is defined as:

1. import random

2. [Link]

The statement [Link]() method return the floating point number that is in the range of [0, 1). The function
generates random float numbers. The methods that are used with the random class are the bound methods of the hidden
instances. The instances of the Random can be done to show the multi-threading programs that creates a different instance
of individual threads. The other random generators that are used in this are:

randrange(a, b): it chooses an integer and define the range in-between [a, b). It returns the elements by selecting it
randomly from the range that is specified. It doesn't build a range object.

uniform(a, b): it chooses a floating point number that is defined in the range of [a,b).Iyt returns the floating point number

normalvariate(mean, sdev): it is used for the normal distribution where the mu is a mean and the sdev is a sigma that is
used for standard deviation.
The Random class that is used and instantiated creates independent multiple random number generators.

53) What is PYTHONPATH?

PYTHONPATH is an environment variable which is used when a module is imported. Whenever a module is imported,
PYTHONPATH is also looked up to check for the presence of the imported modules in various directories. The interpreter
uses it to determine which module to load.

54) What are python modules? Name some commonly used built-in modules in Python?

Python modules are files containing Python code. This code can either be functions classes or variables. A Python module is
a .py file containing executable code.

Some of the commonly used built-in modules are:

o os

o sys

o math

o random

o data time

o JSON

55) What is the difference between range & xrange?

For the most part, xrange and range are the exact same in terms of functionality. They both provide a way to generate a list
of integers for you to use, however you please. The only difference is that range returns a Python list object and x range
returns an xrange object.

This means that xrange doesn't actually generate a static list at run-time like range does. It creates the values as you need
them with a special technique called yielding. This technique is used with a type of object known as generators. That means
that if you have a really gigantic range you'd like to generate a list for, say one billion, xrange is the function to use.

This is especially true if you have a really memory sensitive system such as a cell phone that you are working with, as range
will use as much memory as it can to create your array of integers, which can result in a Memory Error and crash your
program. It's a memory hungry beast.

56) What advantages do NumPy arrays offer over (nested) Python lists?

o Python's lists are efficient general-purpose containers. They support (fairly) efficient insertion, deletion, appending,
and concatenation, and Python's list comprehensions make them easy to construct and manipulate.

o They have certain limitations: they don't support "vectorized" operations like elementwise addition and
multiplication, and the fact that they can contain objects of differing types mean that Python must store type information
for every element, and must execute type dispatching code when operating on each element.

o NumPy is not just more efficient; it is also more convenient. We get a lot of vector and matrix operations for free,
which sometimes allow one to avoid unnecessary work. And they are also efficiently implemented.

o NumPy array is faster and we get a lot built in with NumPy, FFTs, convolutions, fast searching, basic statistics, linear
algebra, histograms, etc.
57) Mention what the Django templates consist of.

The template is a simple text file. It can create any text-based format like XML, CSV, HTML, etc. A template contains
variables that get replaced with values when the template is evaluated and tags (% tag %) that control the logic of the
template.

58) Explain the use of session in Django framework?

Django provides a session that lets the user store and retrieve data on a per-site-visitor basis. Django abstracts the process
of sending and receiving cookies, by placing a session ID cookie on the client side, and storing all the related data on the
server side.

So, the data itself is not stored client side. This is good from a security perspective.

1. How do you subset or filter data in SQL?

To subset or filter data in SQL, we use WHERE and HAVING clauses. Consider the following movie table.

Using this table, let’s find the records for movies that were directed by Brad Bird.

Now, let’s filter the table for directors whose movies have an average duration greater than 115 minutes.
2. What is the difference between a WHERE clause and a HAVING clause in SQL?

Answer all of the given differences when this data analyst interview question is asked, and also give out the syntax for each
to prove your thorough knowledge to the interviewer.

WHERE HAVING

The HAVING clause operates


on aggregated data.
WHERE clause operates on row data.

In the WHERE clause, the filter occurs before any HAVING is used to filter values
groupings are made. from a group.

Aggregate functions can be


used.
Aggregate functions cannot be used.

Syntax of WHERE clause:

SELECT column1, column2, ... FROM table_name

WHERE condition;

Syntax of HAVING clause;

SELECT column_name(s) FROM table_name WHERE condition

GROUP BY column_name(s) HAVING condition

ORDER BY column_name(s);

1. Is the below SQL query correct? If not, how will you rectify it?

The query stated above is incorrect as we cannot use the alias name while filtering data using the WHERE clause. It will
throw an error.
2. How are Union, Intersect, and Except used in SQL?

The Union operator combines the output of two or more SELECT statements. Syntax:

SELECT column_name(s) FROM table1 UNION

SELECT column_name(s) FROM table2;

Let’s consider the following example, where there are two tables - Region 1 and Region 2.

To get the unique records, we use Union.

The Intersect operator returns the common records that are the results of 2 or more SELECT statements. Syntax:

SELECT column_name(s) FROM table1 INTERSECT

SELECT column_name(s) FROM table2;


The Except operator returns the uncommon records that are the results of 2 or more SELECT statements. Syntax:

SELECT column_name(s) FROM table1 EXCEPT

SELECT column_name(s) FROM table2;

Below is the SQL query to return uncommon records from region 1.

3. What is a Subquery in SQL?

A Subquery in SQL is a query within another query. It is also known as a nested query or an inner query. Subqueries are used
to enhance the data to be queried by the main query.

It is of two types - Correlated and Non-Correlated Query.

Below is an example of a subquery that returns the name, email id, and phone number of an employee from Texas city.

SELECT name, email, phone FROM employee

WHERE emp_id IN (
SELECT emp_id FROM employee WHERE city = 'Texas');

4. Using the product_price table, write an SQL query to find the record with the fourth-highest market price.

Fig: Product Price table

select top 4 * from product_price order by mkt_price desc;

Now, select the top one from the above result that is in ascending order of mkt_price.

5. From the product_price table, write an SQL query to find the total and average market price for each currency
where the average market price is greater than 100, and the currency is in INR or AUD.
The SQL query is as follows:

The output of the query is as follows:

6. Using the product and sales order detail table, find the products with total units sold greater than 1.5 million.

Fig: Products table


Fig: Sales order detail table

We can use an inner join to get records from both the tables. We’ll join the tables based on a common key column, i.e.,
ProductID.

The result of the SQL query is shown below.

7. How do you write a stored procedure in SQL?

You must be prepared for this question thoroughly before your next data analyst interview. The stored procedure is an SQL
script that is used to run a task several times.

Let’s look at an example to create a stored procedure to find the sum of the first N natural numbers' squares.

• Create a procedure by giving a name, here it’s squaresum1

• Declare the variables

• Write the formula using the set statement

• Print the values of the computed variable

• To run the stored procedure, use the EXEC command


Output: Display the sum of the square for the first four natural numbers

8. Write an SQL stored procedure to find the total even number between two users given numbers.

Here is the output to print all even numbers between 30 and 45.
Tableau Data Analyst Interview Questions

9. How is joining different from blending in Tableau?

Data Joining Data Blending

Data blending is used when the


data is from two or more
Data joining can only be carried out when the data
different sources.
comes from the same source.
E.g: Combining two or more worksheets from the E.g: Combining the Oracle table
same Excel file or two tables from the same with SQL Server, or combining
databases. Excel sheet and Oracle table or
two sheets from Excel.

All the combined sheets or tables contain a


common set of dimensions and measures.

Meanwhile, in data blending,


each data source contains its
own set of dimensions and
measures.

10. What do you understand by LOD in Tableau?

LOD in Tableau stands for Level of Detail. It is an expression that is used to execute complex queries involving many
dimensions at the data sourcing level. Using LOD expression, you can find duplicate values, synchronize chart axes and
create bins on aggregated data.

11. What are the different connection types in Tableau Software?

There are mainly 2 types of connections available in Tableau.

Extract: Extract is an image of the data that will be extracted from the data source and placed into the Tableau repository.
This image(snapshot) can be refreshed periodically, fully, or incrementally.

Live: The live connection makes a direct connection to the data source. The data will be fetched straight from tables. So,
data is always up to date and consistent.

12. What are the different joins that Tableau provides?

Joins in Tableau work similarly to the SQL join statement. Below are the types of joins that Tableau supports:

• Left Outer Join

• Right Outer Join

• Full Outer Join

• Inner Join
13. What is a Gantt Chart in Tableau?

A Gantt chart in Tableau depicts the progress of value over the period, i.e., it shows the duration of events. It consists of
bars along with the time axis. The Gantt chart is mostly used as a project management tool where each bar is a measure of a
task in the project.

14. Using the Sample Superstore dataset, create a view in Tableau to analyze the sales, profit, and quantity sold across
different subcategories of items present under each category.

• Load the Sample - Superstore dataset

• Drag Category and Subcategory columns into Rows, and Sales on to Columns. It will result in a horizontal bar chart.

• Drag Profit on to Colour, and Quantity on to Label. Sort the Sales axis in descending order of the sum of sales within
each sub-category.
16. Create a dual-axis chart in Tableau to present Sales and Profit across different years using the Sample Superstore dataset.
• Drag the Order Date field from Dimensions on to Columns, and convert it into continuous Month.

• Drag Sales on to Rows, and Profits to the right corner of the view until you see a light green rectangle.
• Synchronize the right axis by right-clicking on the profit axis.

• Under the Marks card, change SUM(Sales) to Bar and SUM(Profit) to Line and adjust the size.
17. Design a view in Tableau to show State-wise Sales and Profit using the Sample Superstore dataset.
• Drag the Country field on to the view section and expand it to see the States.

• Drag the Sales field on to Size, and Profit on to Colour.

• Increase the size of the bubbles, add a border, and halo color.

You might also like