13个Data Scientist面试必问问题

大家好,这里会持续的更新100个CS数据类必问的面试问题,今天是21-33题,如果需要申请data岗一类的小伙伴,都可以来看看。
英文的版本适合留学生申请,或者国内希望跳槽到国外大厂公司的小伙伴。
大家如果有其他的问题,也欢迎来问我,我会多多为大家找干货的!
21) What is the difference between Supervised Learning an unsupervised Learning?
If an algorithm learns something from the training data so that the knowledge can be applied to the test data, then it is referred to as Supervised Learning. Classification is an example for Supervised Learning. If the algorithm does not learn anything beforehand because there is no response variable or any training data, then it is referred to as unsupervised learning. Clustering is an example for unsupervised learning.
22) Explain the use of Combinatorics in data science.
23) Why is vectorization considered a powerful method for optimizing numerical code?
24) What is the goal of A/B Testing?
It is a statistical hypothesis testing for randomized experiment with two variables A and B. The goal of A/B Testing is to identify any changes to the web page to maximize or increase the outcome of an interest. An example for this could be identifying the click through rate for a banner ad.
25) What is an Eigenvalue and Eigenvector?
Eigenvectors are used for understanding linear transformations. In data analysis, we usually calculate the eigenvectors for a correlation or covariance matrix. Eigenvectors are the directions along which a particular linear transformation acts by flipping, compressing or stretching. Eigenvalue can be referred to as the strength of the transformation in the direction of eigenvector or the factor by which the compression occurs.
26) What is Gradient Descent?
27) How can outlier values be treated?
Outlier values can be identified by using univariate or any other graphical analysis method. If the number of outlier values is few then they can be assessed individually but for large number of outliers the values can be substituted with either the 99th or the 1st percentile values. All extreme values are not outlier values. The most common ways to treat outlier values –
1) To change the value and bring in within a range 2) To just remove the value.
28) How can you assess a good logistic model?
There are various methods to assess the results of a logistic regression analysis- • Using Classification Matrix to look at the true negatives and false positives.
• Concordance that helps identify the ability of the logistic model to differentiate between the event happening and not happening.
• Lift helps assess the logistic model by comparing it with random selection.
29) What are various steps involved in an analytics project?
• Understand the business problem
• Explore the data and become familiar with it.
• Prepare the data for modelling by detecting outliers, treating missing values, transforming variables, etc.
• After data preparation, start running the model, analyse the result and tweak the approach. This is an iterative step till the best possible outcome is achieved.
· Validate the model using a new data set.
· Start implementing the model and track the result to analyse the
performance of the model over the period of time.
30) How can you iterate over a list and also retrieve element indices at the same time?
This can be done using the enumerate function which takes every element in a sequence just like in a list and adds its location just before it.
31) During analysis, how do you treat missing values?
The extent of the missing values is identified after identifying the variables with missing values. If any patterns are identified the analyst has to concentrate on them as it could lead to interesting and meaningful business insights. If there are no patterns identified, then the missing values can be substituted with mean or median values (imputation) or they can simply be ignored. There are various factors to be considered when answering this question-
Understand the problem statement, understand the data and then give the answer. Assigning a default value which can be mean, minimum or maximum value. Getting into the data is important.
If it is a categorical variable, the default value is assigned. The missing value is assigned a default value.
If you have a distribution of data coming, for normal distribution give the mean value.
Should we even treat missing values is another important point to consider? If 80% of the values for a variable are missing then you can answer that you would be dropping the variable instead of treating the missing values.
32) Explain about the box cox transformation in regression models.
33) Can you use machine learning for time series analysis?
Yes, it can be used but it depends on the applications.

标签: 暂无标签
北美求职小助手

写了 14 篇文章,拥有财富 0,被 1 人关注

转播转播
回复

使用道具

您需要登录后才可以回帖 登录 | 立即注册
B Color Link Quote Code Smilies

成为第一个吐槽的人

返回顶部