Machine Learning Interview Questions Answers 2020: Latest Interview Questions PDF
If you love technology and are looking for a job that has to do with the science of data, then you've probably heard of machine learnin...
In this tutorial, we will analyze some of the questions in a most popular machine learning interview. We will cover both basic and advanced topics, get ready for a learning adventure, let's get started!
- 1.1 Question 1: Describe 'machine learning'.
- 1.2 Question 2: What is 'deep learning'?
- 1.3 Question 3: What is the difference between 'type 1' and 'type 2' errors?
- 1.4 Question 4: What is 'data enrichment'?
- 1.5 Question 5: Why is the 'Naive Bayesian' classifier so called?
- 1.6 Question 6: Which are better - deep or shallow networks?
- 1.7 Question 7: What is the 'Fourier transform'?
- 1.8 Question 8: What is a 'convulutional network'?
- 1.9 Question 9: What should we know about the correlation between 'true positive' and 'recovery'?
- 1.10 Question 10: What is the 'backward propagation of errors'?
- 1.11 Question 11: What happens if we only use the 'validation set', without applying the 'test set'?
2. Advanced: Questions in a job interview
- 2.1 Question 1: What is the difference between the 'generative' and the 'discriminative' model?
- 2.2 Question 2: Explain the differences between 'cross validation' and 'stratified cross validation'.
- 2.3 Question 3: In which cases are the regressions 'Lasso' and 'Ridge' used?
- 2.4 Question 4: What is 'F1'?
- 2.5 Question 5: In most cases, which one has a higher score, the models as a whole or the individual?
- 2.6 Question 6: What is the difference between 'correlation' and 'covariance'?
- 2.7 Question 7: Describe an 'unbalanced data set'.
- 2.8 Question 8: What is 'data normalization'?
- 2.9 Question 9: Could you capture the correlation between categorical and continuous variables?
- 2.10 Question 10: What is the activation function used for?
Question 1: Describe 'Machine learning'.
There is no way around this question, right?
Most employers will probably ask you something similar in the first question. This due to a number of reasons.
First, your interviewers cannot ask other questions about deep learning without making sure you know what ' machine learning ' is. In addition, the way you respond will show how well you can develop your own definitions - or, in other words, how well you can explain a complicated topic in an easy and understandable way. If you only recite what you have memorized the previous night of a scientific journal, it will probably give you less credibility than if you explained it yourself.
So what is machine learning ?
The easiest and most understandable way to describe machine learning is to call it a specific philosophy of artificial intelligence development . It is a scientific field that is interested in how to create machines that can learn from the information provided to them, without being previously programmed to do so.
Question 2: What is 'deep learning'?
This is one of the questions about deep learning that you will surely receive in your job interview. Since deep learning is closely linked to machine learning, you will probably be asked about it in your interview.
The deep learning is a branch of machine learning . This scientific branch is interested in creating the neural networks of the machine to resemble the human brain as much as possible.
Question 3: What is the difference between 'type 1' and 'type 2' errors?
Type 1 errors claim that something has happened, when in reality, it was impossible to happen. Type 2 errors do the opposite, they say that nothing has happened when it clearly happened.
These types of questions could be a bit confusing, but we are going to provide you with some methods that you can use to remember them easily.
For example, here you have an excellent method to help you remember the difference between the two types of errors: just imagine that the type 1 error is when you tell your dog that it is a cat, while the type 2 error is when you tell it to the same dog that dogs can't bark.
Question 4: What is 'data enrichment'?
This is one of the simplest questions, data enrichment 'data augmentation' is a way to modify and create new data from old data. This is achieved by leaving the objective as it is or simply turning it into something that is already known.
Question 5: Why is the 'Naive Bayesian' classifier so called?
The naive Bayesian is called naive because of the way he thinks. Assume that each element in the data set is the same when it comes to its importance. Obviously, that is almost never the case in everyday situations.
Question 6: Which are better - deep or shallow networks?
You could classify this as one of the questions in a machine learning job interview based on comparison, you have to know a little about both networks and be able to compare them to find a clear difference.
The deep networks are generally considered the best alternative. Simply because they consist of more layers, of which almost all of them are hidden - this helps deep networks extract and create better functions.
Question 7: What is the 'Fourier transform'?
The ' Fourier transform ' method is used to transform simple and generic functions into what is known as superfunctions . If this is one of the questions in a machine learning job interview in which you want to deepen, you could compare it with the situation in which you are given a car to disassemble it and see all the different components and parts that make it up.
Question 8: What is a 'convulutional network'?
Normal and simple networks use connected layers to execute their processes. Instead, convulutional networks are those that instead of using connected layers, use convulutional .
The main reason why people prefer convulutional networks over standard, connected layers, is that a lower number of parameters are attributed to convulutional networks.
Question 9: What should we know about the correlation between 'true positive' and 'recovery'?
Although this sounds like one of the most advanced questions for an interview, the answer is quite simple. Both metrics are identical. We can see it by observing its formula: TP / TP + FN.
Question 10: What is the 'backward propagation of errors'?
Although its term sounds quite high, the backward propagation of errors is simply a training method for multi-level neural networks . We can train the network with this method by taking the ' error ' of each end and placing it on each of the weights of the network. In this way, the machine has the opportunity to perform its calculation effectively.
Question 11: What happens if we only use the 'validation set', without applying the 'test set'?
It is the ideal question to finalize the list of basic machine learning questions, because it is a bit more complicated.
Basically, if you only apply the validation set , you will not provide an accurate estimate of all the model measures you are trying to test. This is because the ' test set ' is used to test how it will work on examples that you have not faced before. Therefore, if you remove the test set, you automatically compromise the possibility of obtaining valid test results, so to speak.
But do not be fooled. Your employers will surely not ask you to create a self-sufficient artificial intelligence system or write a book of three hundred sheets about all the ways you can examine deep learning. In this context, " advanced" simply means that the questions will be a little more difficult - they might ask you to deepen your answers, give examples, etc. So don't worry, relax and let's start.
Question 1: What is the difference between the 'generative' and the 'discriminative' model?
Although it could be heard as a misleading question, your employers just want to know how these models handle the data.
The generative model , as its name implies, will reverse the effort and learn the different categories of data provided. In contrast, a discriminative model will only study the differences between different categories of data.
Developers and engineers generally prefer to use the discriminative model, because it tends to manage tasks more quickly and efficiently.
Question 2: Explain the differences between 'cross validation' and 'stratified cross validation'.
The cross - validation is used to separate random data between the training period and the validation set. The layered cross - validation does exactly the same, but without random variable - and still preserves the training ratio vs. test validation This is one of the questions that could confuse you, be careful!
Question 3: In which cases are the regressions 'Lasso' and 'Ridge' used?
Enter the category of advanced questions because you have to deeply rationalize the two types of regressions to provide a valid answer.
The Lasso regression can perform both functions and reduce variable selection parameters while Ridge regression can only be used for the latter. With that in mind, you will most likely use the LAsso regression when you have only a few variables and a great effect. On the other hand, a Ridge regression should be used when there are many small variables.
This is an excellent example of questions about machine learning in which you could give a detailed answer and not just recite something memorized.
Question 4: What is 'F1'?
No, it is not a key on your keyboard that you can press to get the answer.
The F1 score is the estimate of how well your model is doing. Any number close to the ' 1 ' mark is excellent, everything under the ' 0.5 ' mark should be reviewed.
Question 5: In most cases, which one has a higher score, the models as a whole or the individual?
The sets are generally those that provide a higher score. This is simply because they are combinations of different models, made to predict a single result in particular. The more models there are, the more errors they can solve - and the better prediction of the final score.
Question 6: What is the difference between 'correlation' and 'covariance'?
This could be a rather complicated question if you don't know the correlation between the two (it 's serious ).
Although if you know, the answer is easy: covariance becomes a correlation once standardized .
Question 7: Describe an 'unbalanced data set'.
An unbalanced data set is a set that, after the test, returns the results that more than half of all the information is stationed in only one class.
How to avoid this? Well, there are a couple of simple solutions - whether you test again with a new algorithm or try to test a larger amount of data so that the results are matched.
Question 8: What is 'data normalization'?
Do you remember when we talk about the ' backward propagation of errors '? Well, data normalization is used to minimize data redundancy in the backward propagation process. This allows the user to adjust different values as necessary, thus eliminating possible redundancy problems.
Question 9: Could you capture the correlation between categorical and continuous variables?
It is possible, but you should use what is known as the Covariance Analysis ( ANCOVA ) method . Using it, you can capture the correlation.
Question 10: What is the activation function used for?
This function allows you to diversify your network by introducing non-linear learning methods. What this does is that it will help your machine learn how to process complicated processes easily.
Whether you're looking for a job as an IT specialist or an expert in artificial intelligence and machine learning, remember to review and remember these questions and answers. Sure, we've only covered the tip of the iceberg, but if you learn these questions and their answers by heart, you'll have a clear idea of what you can expect in the interview.