Get Latest Exam Updates, Free Study materials and Tips
1.What are Different Types of Machine Learning algorithms?
Ans :

2.What is supervised learning?
Ans :
Supervised learning is when we teach or train the machine using data that is well labelled. Which means some data is already tagged with the correct answer. After that, the machine is provided with a new set of examples(data) so that the supervised learning algorithm analyses the training data(set of training examples) and produces a correct outcome from labelled data.
Ex:- suppose you are given a basket filled with different kinds of fruits. Now the first step is to train the machine with all different fruits one by one like this:
If the shape of the object is rounded and has a depression at the top, is red in colour, then it will be labelled as –Apple.
If the shape of the object is a long curving cylinder having Green-Yellow colour, then it will be labelled as –Banana.
Now suppose after training the data, you have given a new separate fruit, say Banana from the basket, and asked to identify it.
Since the machine has already learned the things from previous data and this time have to use it wisely. It will first classify the fruit with its shape and colour and would confirm the fruit name as BANANA
and put it in the Banana category. Thus the machine learns things from training data and then applies the knowledge to test data(new fruit).
3.What is Unsupervised Learning?
Ans:
Unsupervised learning is where you only have input data (X) and no corresponding output variables.
The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.
These are called unsupervised learning because unlike supervised learning above there is no correct answers and there is no teacher. Algorithms are left to their own devises to discover and present the interesting structure in the data.
Ex:- Suppose the unsupervised learning algorithm is given an input dataset containing images of different types of cats and dogs.
The algorithm is never trained upon the given dataset, which means it does not have any idea about the features of the dataset.
The task of the unsupervised learning algorithm is to identify the image features on their own. An unsupervised learning algorithm will perform this task by clustering the image dataset into groups according to similarities between images.

4. What is the difference between Data Mining and Machine Learning?
Ans:
Data mining can be described as the process in which the structured data tries to abstract knowledge or interesting unknown patterns. During this process, machine learning algorithms are used.
Machine learning represents the study, design, and development of the algorithms which provide the ability to the processors to learn without being explicitly programmed.
5.What is ‘Overfitting’?
Ans:
when a statistical model describes random error or noise instead of underlying relationship ‘overfitting’ occurs. When a model is excessively complex, overfitting is normally observed, because of having too many parameters with respect to the number of training data types. The model exhibits poor performance which has been overfitting.
6.How would you handle an imbalanced dataset?
Ans:
An imbalanced dataset is when you have, for example, a classification test and 90% of the data is in one class. That leads to problems: an accuracy of 90% can be skewed if you have no predictive power on the other category of data! Here are a few tactics to get over the hump:
What’s important here is that you have a keen sense of what damage an unbalanced dataset can cause, and how to balance that.
7.What are Different Kernels in SVM?
Ans :
There are six types of kernels in SVM:
8.What is Cross-Validation?
Ans :
Cross-validation is a method of splitting all your data into three parts: training, testing, and validation data. Data is split into k subsets, and the model has trained on k-1of those datasets.
The last subset is held for testing. This is done for each of the subsets. This is k-fold cross-validation. Finally, the scores from all the k-folds are averaged to produce the final score.
9.How to Handle Outlier Values?
Ans :
An Outlier is an observation in the dataset that is far away from other observations in the dataset. Tools used to discover outliers are
10.Simple strategies to tackle outliers?
Ans:
11.What is Clustering?
Ans :
Clustering is the process of grouping a set of objects into a number of groups. Objects should be similar to one another within the same cluster and dissimilar to those in other clusters.
A few types of clustering are:
12.How can you select K for K-means Clustering?
Ans:
There are two kinds of methods that include direct methods and statistical testing methods:
The silhouette is the most frequently used while determining the optimal value of k.
13.Explain Correlation and Covariance?
Ans :
Correlation is used for measuring and also for estimating the quantitative relationship between two variables. Correlation measures how strongly two variables are related. Examples like, income and expenditure, demand and supply, etc.
Covariance is a simple way to measure the correlation between two variables. The problem with covariance is that they are hard to compare without normalization.
14.What is P-value?
Ans :
P-values are used to make a decision about a hypothesis test. P-value is the minimum significant level at which you can reject the null hypothesis. The lower the p-value, the more likely you reject the null hypothesis.
15.What are the common ways to handle missing data in a dataset?
Ans:
Missing data is one of the standard factors while working with data and handling. It is considered one of the greatest challenges faced by data analysts. There are many ways one can impute the missing values. Some of the common methods to handle missing data in datasets can be defined as deleting the rows, replacing them with mean/median/mode, predicting the missing values, assigning a unique category, using algorithms that support missing values, etc.
16.What is Confusion Matrix?
Ans:
A confusion matrix is a table that is used for summarizing the performance of a classification algorithm. It is also known as the error matrix.

Where,
TN=TrueNegative
TP=TruePositive
FN=FalseNegative
FP= False Positive
17.True Positive, True Negative, False Positive, and False Negative in Confusion Matrix with an example.
Ans:
When a model correctly predicts the positive class, it is said to be a true positive.
For example, Umpire gives a Batsman NOT OUT when he is NOT OUT.
When a model correctly predicts the negative class, it is said to be a true negative.
For example, Umpire gives a Batsman OUT when he is OUT.
When a model incorrectly predicts the positive class, it is said to be a false positive. It is also known as ‘Type I’ error.
For example, Umpire gives a Batsman NOT OUT when he is OUT.
When a model incorrectly predicts the negative class, it is said to be a false negative. It is also known as ‘Type II’ error.
For example, Umpire gives a Batsman OUT when he is NOT OUT.
18.What’s the F1 score? How would you use it?
Ans:
The F1 score is a measure of a model’s performance. It is a weighted average of the precision and recall of a model, with results tending to 1 being the best, and those tending to 0 being the worst. You would use it in classification tests where true negatives don’t matter much.
19.What is Collaborative Filtering? And Content-Based Filtering?
Ans:
Collaborative filtering is a proven technique for personalized content recommendations. Collaborative filtering is a type of recommendation system that predicts new content by matching the interests of the individual user with the preferences of many users.
Content-based recommender systems are focused only on the preferences of the user. New recommendations are made to the user from similar content according to the user’s previous choices.
20.What is Bagging and Boosting?
Ans:
21.What are the performance metrics that can be used to estimate the efficiency of a linear regression model?
Ans:
The performance metric that is used in this case is:
22.What are the benefits of pruning?
Ans:
Pruning helps in the following:
23.What is a normal distribution?
Ans:
The distribution having the below properties is called a normal distribution.
24.Which techniques are used to find similarities in the recommendation system?
Ans:
Pearson correlation and Cosine correlation are techniques used to find similarities in recommendation systems.
25.Explain the difference between Normalization and Standardization.
Ans:
Normalization and Standardization are the two very popular methods used for feature scaling. Normalization refers to re-scaling the values to fit into a range of [0,1]. Standardization refers to re-scaling data to have a mean of 0 and a standard deviation of 1 (Unit variance). Normalization is useful when all parameters need to have an identical positive scale however the outliers from the data set are lost. Hence, standardization is recommended for most applications.
Prepare For Your Placements: https://lastmomenttuitions.com/courses/placement-preparation/
![]()
/ Youtube Channel: https://www.youtube.com/channel/UCGFNZxMqKLsqWERX_N2f08Q
Follow For Latest Updates, Study Tips & More Content!
Not a member yet? Register now
Are you a member? Login now