30个Data Scientist面试必问问题

大家好,这里会持续的更新100个CS数据类必问的面试问题,如果需要申请data岗一类的小伙伴,都可以来看看。
英文的版本适合留学生申请,或者国内希望跳槽到国外大厂公司的小伙伴。
大家如果有其他的问题,也欢迎来问我,我会多多为大家找干货的!
40)What is the curse of dimensionality?
41) How do you decide whether your linear regression model fits the data? What is the difference between squared error and absolute error?
What is Machine Learning?
The simplest way to answer this question is – we give the data and equation to the machine. Ask the machine to look at the data and identify the coefficient values in an equation.
For example for the linear regression y=mx+c, we give the data for the variable x, y and the machine learns about the values of m and c from the data.
42) How are confidence intervals constructed and how will you interpret them?
43) How will you explain logistic regression to an economist, physican scientist and biologist?
44) How can you overcome Overfitting?
45) Differentiate between wide and tall data formats?
46) Is Naïve Bayes bad? If yes, under what aspects.
47) How would you develop a model to identify plagiarism?
48) How will you define the number of clusters in a clustering algorithm?
49) Is it better to have too many false negatives or too many false positives?
50) Is it possible to perform logistic regression with Microsoft Excel?
51) What do you understand by Fuzzy merging ? Which language will you use to handle it?
52) What is the difference between skewed and uniform distribution?
53) You created a predictive model of a quantitative outcome variable using multiple regressions. What are the steps you would follow to validate the model? 54) What do you understand by Hypothesis in the content of Machine Learning? 55) What do you understand by Recall and Precision?
56) How will you find the right K for K-means?
57) Why L1 regularizations causes parameter sparsity whereas L2 regularization does not?
58) How can you deal with different types of seasonality in time series modelling?
59) In experimental design, is it necessary to do randomization? If yes, why?
60) What do you understand by conjugate-prior with respect to Naïve Bayes?
61) Can you cite some examples where a false positive is important than a false negative?
62) Can you cite some examples where a false negative important than a false positive?
63) Can you cite some examples where both false positive and false negatives are equally important?
64) Can you explain the difference between a Test Set and a Validation Set?
Validation set can be considered as a part of the training set as it is used for parameter selection and to avoid Overfitting of the model being built. On the other hand, test set is used for testing or evaluating the performance of a trained machine leaning model.
In simple terms ,the differences can be summarized as-
Training Set is to fit the parameters i.e. weights.
Test Set is to assess the performance of the model i.e. evaluating the predictive power and generalization.
Validation set is to tune the parameters.
65) What makes a dataset gold standard?
66) What do you understand by statistical power of sensitivity and how do you calculate it?
67) What is the importance of having a selection bias?
68) Give some situations where you will use an SVM over a RandomForest Machine Learning algorithm and vice-versa.
SVM and Random Forest are both used in classification problems.
a) If you are sure that your data is outlier free and clean then go for SVM. It is the opposite - if your data might contain outliers then Random forest would be the best choice
b) Generally, SVM consumes more computational power than Random Forest, so if you are constrained with memory go for Random Forest 69) What do you understand by feature vectors?
70) How do data management procedures like missing data handling make selection bias worse?

标签: 暂无标签
北美求职小助手

写了 14 篇文章,拥有财富 0,被 1 人关注

转播转播
回复

使用道具

您需要登录后才可以回帖 登录 | 立即注册
B Color Link Quote Code Smilies

成为第一个吐槽的人

返回顶部