【专题】机器学习期末复习资料
机器学习期末复习资料(题库)
链接:https://blog.csdn.net/Pqf18064375973/article/details/148105494?sharetype=blogdetail&sharerId=148105494&sharerefer=PC&sharesource=Pqf18064375973&sharefrom=mp_from_link
【测试】
-
Artificial intelligence is a broad area of computer science and is defined where in —— have the ability to learn and think as a human would.()
A.machines
B.humans
C.mobiles
D.telivision -
A —— algorithm enables it to identify pattens in observed data, build models that explain the world, and predict things without having explicit pre-programmed rules and models.()
A.Artifcial Intelligence
B.Machine Learning
C.Deep Learning
D.None of the Above -
Using past purchase behavior data, —— can help to discover data trends that can be used to develop more effective cross-selling strategies. ()
A.Unsupervised learning
B.Supervised learning
0C.Reinforcement Learning
D.Classifcation Model -
Find the correct statement :
Statement A: Duplicate or missing values may give an incorrect view of the overall statistics of data
Statement B: Outliers and inconsistent data points often tend to disturb the model’s overallearning, leading to false predictions
A.Only Statement A is true
B.0nly Staterent B is true
C.Both Statement A and Statement B are true
D.Both Statement A and Statement B are false -
ldentify the correct code to import a dataset name 'nba.csv?()
A.df= pd.read-csv("CSV_Data/nba.csv)
B.df = pd.read.csv(“0SV Data/nba.cav”)
C.df= pd.read_csv(csv_Datanba.csv")
D.df= pd.readcsv(“CSV_Datanba.cav” -
Which of the folowing refers to the process of removing unwanted varables and vales from your dataset and geting rid of any irregularties in it?
A.Data Cleaning
B.Univariate Analysis
C.Bivariate Analyais
D.None of the Above -
Identify the code where you want to replace null values with mean from the same column from where you get the mean. ()
A.df[salary].replace(np.NaN, df[salary].mean().head(10)
B.dfTSalary|.mean(np.NaN, dfSalary].replace().head(10)
C.dfSalary].replacenull(np.NaN, df’Salary].mean()).head(10)
D.df[Salary].replace_null(np.NaN, df[Salary].mean().head(10) -
—— is the process of gathering, sorting, and transforming data from an original “raw” format, in order to prepare it for analysis and otheldownstream processes.()
A.Data Acquisition
B.Exploratory Data Analyais
C.Data Wrangling
D.Data Manipulation -
—— is defined as a process that enables users in data organization in order to make reading or interpret the insights from the data and comprises of havingbetter design.()
A.Data Acquisition
B.Exploratory Data Analyais
c.Date Wrangling
D.Data Manipulation -
Find the correct statement :
Statement A: The value of an independent variable does not change based on the effects of other variables.
Statement B: The vale of dependent variable changes when there is any change in the values of the independent variables, as mentioned before
A.Only Statement A is true
B.0nly Staterent B is true
C.Both Statement A and Statement B are true
D.Both Staterent A and Statement B are false -
When a model has not leamed the patterns in the training data well and is unable to generalize wel on the new data, it is known as ——
A.Beat Fit
B.Data Fitting
C.Under fittting
D.Over fitting -
Dimensionality reduction refers to technigues that reduce the number of input —— in a dataset.
A.variables
B.columns
C.rows
D.dataset -
Which of the following is considered as Basic Assumptions for Factor Analysis? ()
A.There are no outliers in data.
B.Sample aize should be greater than the factor.
C.There should not be perfect multicollinearity
D.All of the Abowe -
Which of the followings is used for standardize features by removing the mean and scaling to unit variance?(
A.Bartlett’s Test
B.StandardScaler()
C.Kaiser-Meyer-0lkin (KMO) Test
D.Commonalities -
Which of the following function is used to find the amount of variance explained by each factors?()
A.standardscaler
B.Commonalities
C.loading_
D.get_factor_variance() -
Which of the followings is/are used to express the correlation between any two or more attributes in a multidimensional dataset? ()
A.Standardization
B.Covariance Matrix
C.Eigen Vectors and Eigen Values
D.Feature Vectors -
Which of the following is a supenvised machine learing technique used to find a linear combination of features that separates two or more classes of objects or events?()
A.Factor Analysis (FA)
B.Principal Component Analysis (PCA)
C.Linear Discriminant Analysis(LDA)
D.All of the Abowe -
Find the correct statement .
Statement A: linear discriminant analysis is a supervised dimensionality reduction technigue that also achieves classification of the data simultaneouslyr
Statement B. Principal component analysis is an unsupervised dimensionality reduction technique, it ignores the class label.
A.Only Statement A is true
B.Only Statement B is true
C.Both Statement A and Stetement B are true
D.Both Statement A and Statement B are false -
Which of the followings is not a classification algorithm? ()
A.Decision Tree
B.Random Forest
C.Naive Bayes
D.Logistic regression -
Which among these is not an disadvantage of using decision tree algorithm for classification?()
A.Overfitting
B.High Variance
C.Low Biased Tree
D.Little data preparation -
Which of the following in a decision tree carries the final results and cannot be split any further?
A.Root Node
B.Decision Node
C.Leaf Node
D.Split Node -
Find the correct statement :
Statement A: Random Forest is a learning method that operates by constructing multiple decision trees.
Statement B: The final decision is made based on the majority of the trees and is chosen by the random forest.
A.Only Statement A is true
B.0nly Statement B ia true
C.Both Statement A and Statement B are true
D.Both Statement A and Statement B are false -
According to Bayes theorem:
what does P(AlB) denotes? ()
A.Conditional Probability of A given B
B.Conditional Probability of B given A
C.Probability of Event A
D.Probability of Event B -
Identify the correct steps for the below statement: A Machine Learning system learns from historical data then it- ()
【C】
A. 1. receives new data2.builde the prediction models3.predicts the output for it B. 1.builds the prediction models2.predicts the output for it3.receives new data C. 1.builds the prediction medels2 receives new data3.predicts the output for it D. None of the Above
-
Identify the correct supervised learning algorithms fro the below statement.
In this you divide your customers based on common characteristics - such as demographics or behaviors, so you can marketto those customers more effectivey
A.Predicting housing prices
B.Text categorization
C.Face Detection
D.Customer Segmentation -
—— is a data analytics process to understand the data in depth and learn the different data characteristics, often with visual means.
A.Data Acquisition
B.Exploratory Data Analyais
C.Data Wrangling
D.Data Manipulation -
You need to check to see the number of missing values in each column and the percentage of missing values they contribute to the dataset.ldenify the corect codeto achieve that. ()
【B】
A. total=df.isnull(.sum(.sort values(ascending=False)missing_data=pd.concat(total,percentlaxia=1 Jkeya= Total Percent]) B. total=dfianull0.sum(.sort_values(ascending=False)percent=(dfisnull0.sum/df.isnull(.count().sort_values(ascending=False)missing_data=pd.concat([total,percentlaxis=1,keys=[Total,Percent]) C. total=dfisnull(.sum(.sort_valuea(ascending=Falsepercent=(dfisnull().concat(/df.inull(.count()).sort_values(ascending=False)missing_data=pd.sum(ftotal,percent,axis=1.keys= Total',Percent) D. total=df.isnull().sort values(ascending=False)percent=(df.isnull().concat(/df.isnull().count().sort_values(ascending=False)missing_catasoc sumdtote cercent exis=1cevs=[Tote !Pencent]
-
Identify the correct code where a column has missing value and you want to replace it with a new category.
A.df[College].fll(u).head(10)
B.dfCollege].fll_na(u).head(10)
C.df[college].fillna(u).head(10)
D.dfCollege].filnull(u).head(10) -
How can we access the data values fited in the particular row or column based on the index value passed to the function? ()
A.Using loc() function
B.using flter() function
C.using groupby0 function
D.None of the Above -
Find the correct statement :
Statement A. Classification Suervised Learnino is used when the outout variable is a real or continuous value
Statement B: Regression is used when the output variable iscategorical.
A.Only Statement A is true
B.0nly Statement B is true
C.Both Statement A and Statement B are true
D.Both Statement A and Statement B are false -
In the equation, y = m * x + c, what is denoted as slope of the line. ()
A.y
B.m
C.x
D.c
-
Which of the folowing is refers to techmiaues that are used to calibrate machine eaming models in order to minimizethe adiusted oss function and prevent overfitting or underftting? ()
A.Regularization
B.Logiatic regression
C.Confusion iatrix
D.Root Mean Squared Error -
Which of the following does not consider as an advantage of Dimentionality reduction?
A.Fewer features mean less complexity
B.You will need less storage space because you have fewer data
C.Many features reguire less computation tirne
D.Model accuracy improves due to less misleading data -
A —— is a latent variable which describes the association among the number of observed variables.
A.Factor
B.Factor loading
C.Eigenvalues
D.Communalities -
While using the loading function the loading score range will be betweeen()
A1.0
B.-1.0
C…1.1
D.0.-1 -
Which straight line is used to captures most of the variance of the data and they have a direction and magnitude?
A.Principal Components
B.Build Covariance Matrix
C.Eigen Vectors and Eigen Walues
D.Feature Vectora -
Which of the folowings is/are the mathematical values that are extracted from the covariance table and they are responsible for the generation of new set of variables from old set of variables which further lead to the construction of principal component?
AStandardization
B.Covariance Matrix
C.Eigen Vectors and Eigen Values
D.Feature Vectors -
Which of the following is a limitations of Logistic Regression? ()
A.Two-Class Problems
B.Unatable with Well Separated Classes
C.Unstable with Few Examples
D.All of the Above -
Find the correct statement :
Statement A: Principal component analysis focuses on finding a feature subspace that maximizes the separatability between the groups
Statement B: Linear discriminant analysis focuses on capturing the direction of maximum variation in the data set.
A.Only Statement A is true
B.Only Statement B is true
C.Both Statement A and Statement B are true
D.Both Statement A and Statement B are false -
Classification algorithms are used to classify the data into
A.group
B.groups and categories
C.categoriea
D.dataset
【课后题】
-
Which among these do not belong to artificial intelligence?
a. natural language processing
b. autonomous vehicles
c. accounting
d. image recognition
-
—— concept pertains to a machine being more intelligent than a human being.
a. Artificial Narrow Intelligence
b. Artificial General Intelligence
c. Artificial Super Intelligence
d. None of the Above
-
A _____________ algorithm enables it to identify patterns in observed data, build models that explain the world, and predict things without having explicit pre-programmed rules and models.
a. Artificial Intelligence
b. Machine Learning
c. Deep Learning
d. None of the Above
-
Which of the following algorithm are used for visual perception tasks, such as object recognition?
a. Supervised Machine Learning algorithms.
b. Unsupervised Machine Learning algorithms.
c. Reinforcement Learning
d. Classification Model
-
—— is a real-time machine learning application that determines the emotion or opinion of the speaker or the writer.
a. Product Recommendations
b. Image recognition
c. Sentiment Analysis
d. Language Translation
-
Which of the followings is a stage in data preprocessing?
a. Exploratory Data Analysis
b. Data Wrangling
c. Data Manipulation
d. All of the Above
-
Which of the following libraries is a Python 2D plotting library that is used to plot any type of charts in Python?.
a. Numpy
b. Pandas
c. Matplotlib
d. Seaborn
-
Identify the correction option for df.shape command?
a. It will return the number of columns
b. It will return the number of rows
c. It will return the number of column and row
d. It will return the datatype of column
-
—— is defined as a process that enables users in data organization in order to make reading or interpret the insights from the data and comprises of having better design
a. Exploratory Data Analysis
b. Data Wrangling
c. Data Acquisition
d. Data Manipulation
-
Which of the following is not a supervised learning algorithm?
a. k-means clustering
b. Linear Regression
c. Logistic Regression
d. Support Vector Machine
-
—— is used when the output variable is a real or continuous value. In this case, there is a relationship between two or more variables i.e., a change in one variable is associated with a change in the other variable.
a. Regression
b. Classification
c. k-means clustering
d. Support Vector Machine
-
modifies the over-fitted or under fitted models by adding the penalty equivalent to the sum of the squares of the magnitude of coefficients.
a. R-squared
b. Adjusted R-sqaured
c. Ridge Regularization
d. Lasso Regularization
-
Find the correct statement:
Statement A: The value of an independent variable does not change based on the effects of other variables.
Statement B: The value of dependent variable changes when there is any change in the values of the independent variables, as mentioned before.
a. Only Statement A is true
b. Only Statement B is true
c. Both Statement A and Statement B are true
d. Both Statement A and Statement B are false
-
The process of plotting a series of data points and drawing the best fit line to understand the relationship between the variables is called ____________.
a. Underfitting
b. Overfitting
c. Data Fitting
d. None of the Above
-
Which of the following is Dimentionality Reduction Techniques?
a. Factor Analysis (FA)
b. Principal Component Analysis (PCA)
c. Linear Discriminant Analysis (LDA)
d. All of the above
-
—— are the sum of the squared loadings for each variable and it represents the common variance.
a. Factor
b. Factor loading
c. Eigenvalues
d. Communalities
-
—— use factor analysis to identify price-sensitive customers, identify brand features that influence consumer choice, and helps in understanding channel selection criteria for the distribution channel.
a. Advertisers
b. Market researchers
c. Psychologist
d. None of the Above
-
Values close to —— indicate that the factor have influence on these variables.
a. 0 or -1
b. -1 or 1
c. 0 or 1
d. -1 or 0
-
Which of the followings is/are used to express the correlation between any two or more attributes in a multidimensional dataset?
a. Standardization
b. Build Covariance Matrix
c. Eigen Vectors and Eigen Values
d. Feature Vectors
-
Which of the followings is not a classification algorithm?
a. Decision Tree
b. Random Forest
c. Naive Bayes
d. Logistic regression
-
Which of the following is used to determine the correct variable for splitting nodes?
a. Entropy
b. Information Gain
c. Gini Index
d. Root Node
-
Which of the following is not an advantage of using random forest algorithm?
a. No Overfitting
b. High Accuracy
c. Estimate Missing Data
d. High Variance
-
Support Vectors are data points that are —— the hyperplane and influence the position and orientation of the hyperplane.
a. closer to
b. far from
c. in between
d. None of the Above
-
Which of the followings is a Kernel SVM Classifiers?
a. Linear Kernel
b. Polynomial Kernel
c. RBF Kernel
d. All of the Above
-
Which of the following is not a type of Clustering?
a. Agglomerative
b. Divisive
c. K-Means
d. Factor Analysis
-
Identify the formula of Squared Euclidean distance when we are deciding the closeness of two clusters(a, b)?
a. ||a-b||2 = √(Σ(ai-bi))
b. ||a-b||2 2 = Σ((ai-bi)2)
c. ||a-b||1 = Σ|ai-bi|
d. ||a-b||INFINITY = maxi|ai-bi|
-
Which of the following is used to measure the distance in an ordinary straight line?
a. Euclidean Distance Measure
b. Squared Euclidean Distance Measure
c. Manhattan Distance Measure
d. Cosine Distance Measure
-
—— is a certain sequence of data observations that a system collects within specific periods of time.
a. Time Series
b. Regularization
c. Logistic regression
d. Confusion Matrix
-
In time series the data has to be stationary. And Stationarity of Time series depends on —— , variance, and covariance.
a. Mean
b. Median
c. Mode
d. None of the Above
-
Which of the following models predict future behavior using past behavior where there is some correlation between past and future data?
a. Auto Regressive (AR) Model
b. Moving Average (MA)
c. Auto Regressive Moving Average (ARMA)
d. Auto Regressive Integrated Moving Average (ARIMA)
-
Which of the following includes the modeling of exogenous variables?
a. ARMA
b. ARIMA
c. SARIMA
d. SARIMAX
-
Which of the following is an example of Ensemble Learning?
a. Regularization
b. Logistic regression
c. Confusion Matrix
d. Random Forest
-
Which of the following is not a simple Ensemble Learning Method?
a. Mode
b. Mean/Average
c. Weighted Average
d. Boosting
-
Which of the following is not an advantage of Bagging in Machine Learning?
a. Minimizes the overfitting of data.
b. Improves the model’s accuracy.
c. Deals with higher dimensional data efficiently
d. Determines the worst, best, and expected values for several scenarios
-
Which of the following boosting techniques is also known as XGBoost?
a. Adaptive boosting
b. Gradient boosting
c. Extreme Gradient Boosting
d. None of the Above
-
Which of the following is not a Cross Validation model?
a. K-fold
b. K-means
c. Leave One Out
d. Stratified K-fold
-
Which of the following is not a type of recommendation systems?
a) Content-Based Recommendation
b) Collaborative Filtering
c) Hybrid
d) Rating-Based Recommendation
-
Find the correct statement:
Statement A: Content-Based Recommendation recommends items based on similarity measures between users and/or items.
Statement B: Each method has its strength. It would be best if we can combine all those strengths and provide a better recommendation. This idea leads us to another improvement of the recommendation, which is the hybrid method.
a) Only Statement A is true
b) Only Statement B is true
c) Both Statement A and Statement B are true
d) Both Statement A and Statement B are false
-
—— finds interesting associations and relationships among large sets of data items. This technique shows how frequently an itemset occurs in a transaction.
a) Association rule mining
b) Apriori Algorithm
c) User-based nearest neighbor
d) Item-based nearest neighbor
-
Which of the following is an implication expression of the form X -> Y, where X and Y are any 2 item sets?
a) Support Count
b) Frequent Item set
c) Association Rule
d) None of the Above
-
Which of the following is correct for text mining when automatic extraction of structured data such as entities, entities relationships, and attributes describing entities from an unstructured source?
a) Information Extraction
b) Natural Language Processing
c) Data Mining
d) Information Retrieval
-
What does NTLK stands for?
a) Natural Language Toolkit
b) Non-natural Language Toolkit
c) Neutral Language Toolkit
d) New Language Toolkit
-
Which of the following is more accurate as it uses more informed analysis to create groups of words with similar meanings based on the context?
a) Removing Punctuations
b) Removal of Frequent Words
c) Stemming
d) Lemmatization