Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Data Science Coding Interview Questions

48. What will be the output of the following R programming code?

var2<- c("I","Love,"ProjectPro")

var2

It will give an error.

49. Find the First Unique Character in a String.

def frstuniquechar(strng: str) -> int:

# Lowercase

strng = strng.lower()

# Here is a dictionary that will contain each unique letter and its counts

c = {}

#Iterating over every letter in the string

for letter in strng:

# If can’t find the letter in dictionary, add it and set the count to 1

if letter not in c:

c[letter] = 1

# If can’t find the letter in dictionary, add 1 to the count

else:

c[letter] += 1

#Iterating the range of string length


for i in range(len(strng)):

# If there's only one letter

if c[strng[i]] == 1:

# Return the index position

return i

# No first unique character

return -1

# Test cases

for s in ['Hello', 'Hello ProjectPro!', 'Thank you for visiting.']:

print(f"Index: {frstuniquechar(strng=s)}")

50. Write the code to calculate the Factorial of a number using


Recursion.

def fact(num):

# Extreme cases

if num< 0: return -1

if num == 0: return 1

# Exit condition - num = 1

if num == 1:

return num

else:
# Recursion Used

return num * factorial(num - 1)

# Test cases

for num in [1, 3, 5, 6, 8, -10]:

print(f"{num}! = {fact(num=num)}")

Statistics Data Science Interview Questions


1. Out of L1 and L2 regularizations, which one causes parameter sparsity and
why?
2. List the differences between Bayesian Estimate and Maximum Likelihood
Estimation (MLE).
3. Differentiate between Cluster and Systematic Sampling?
4. How will you prevent overfitting when creating a statistical model?

Python Data Science Questions for Interview


1. Explain the range function.
2. How can you freeze an already built machine learning model for later use?
What command you would use?
3. Differentiate between func and func().
4. Write the command to import a decision tree classification algorithm using
sklearn library.
5. What do you understand by pickling in Python?

Entry-Level Data Scientist Interview Questions


1. What are some common data preprocessing techniques used in data
science?
2. What is the difference between bias and variance in the context of machine
learning models?
3. What is the purpose of feature scaling in machine learning? Name a few
techniques used for feature scaling.
4. How would you handle missing values in a dataset? What are some
common imputation techniques?
5. What is the role of feature selection in machine learning? Discuss some
approaches to select relevant features from a large feature space.
Senior Data Scientist Interview Questions
1. How do you handle the issue of model interpretability and explainability?
2. Can you discuss a time when you had to work with a large dataset that
exceeded the memory capacity of your machine? What strategies or tools
did you use to handle the data and perform analysis efficiently?
3. How do you approach the process of evaluating and selecting the most
appropriate evaluation metrics for a machine-learning model? Can you
provide an example of a project where you had to make decisions about
evaluation metrics?
4. Explain the concept of bias and variance in machine learning models. How
do you balance these two factors to ensure the optimal performance of a
model?

Suggested Answers by Data Scientists for Open-


Ended Data Science Interview Questions
1. How can you ensure that you don’t analyze something that
ends up producing meaningless results?

Understanding whether the model chosen is correct or not. Start understanding


from the point where you did Univariate or Bivariate analysis, analyzed the
distribution of data and correlation of variables, and built the linear model. Linear
regression has an inherent requirement that the data and the errors in the data
should be normally distributed. If they are not then we cannot use linear
regression. This is an inductive approach to find out if the analysis using linear
regression will yield meaningless results or not.

Another way is to train and test data sets by sampling them multiple times. Predict
on all those datasets to determine whether the resultant models are similar and are
performing well.

By looking at the p-value, by looking at r square values, by looking at the fit of the
function, and analyzing as to how the treatment of missing value could have
affected- data scientists can analyze if something will produce meaningless results.

You might also like