当前位置: 首页 > news >正文

【专题】机器学习期末复习资料

机器学习期末复习资料(题库)
链接:https://blog.csdn.net/Pqf18064375973/article/details/148105494?sharetype=blogdetail&sharerId=148105494&sharerefer=PC&sharesource=Pqf18064375973&sharefrom=mp_from_link

【测试】

  • Artificial intelligence is a broad area of computer science and is defined where in —— have the ability to learn and think as a human would.()
    A.machines
    B.humans
    C.mobiles
    D.telivision

  • A —— algorithm enables it to identify pattens in observed data, build models that explain the world, and predict things without having explicit pre-programmed rules and models.()
    A.Artifcial Intelligence
    B.Machine Learning
    C.Deep Learning
    D.None of the Above

  • Using past purchase behavior data, —— can help to discover data trends that can be used to develop more effective cross-selling strategies. ()
    A.Unsupervised learning
    B.Supervised learning
    0C.Reinforcement Learning
    D.Classifcation Model

  • Find the correct statement :

    Statement A: Duplicate or missing values may give an incorrect view of the overall statistics of data

    Statement B: Outliers and inconsistent data points often tend to disturb the model’s overallearning, leading to false predictions

    A.Only Statement A is true
    B.0nly Staterent B is true
    C.Both Statement A and Statement B are true
    D.Both Statement A and Statement B are false

  • ldentify the correct code to import a dataset name 'nba.csv?()
    A.df= pd.read-csv("CSV_Data/nba.csv)
    B.df = pd.read.csv(“0SV Data/nba.cav”)
    C.df= pd.read_csv(csv_Datanba.csv")
    D.df= pd.readcsv(“CSV_Datanba.cav”

  • Which of the folowing refers to the process of removing unwanted varables and vales from your dataset and geting rid of any irregularties in it?
    A.Data Cleaning
    B.Univariate Analysis
    C.Bivariate Analyais
    D.None of the Above

  • Identify the code where you want to replace null values with mean from the same column from where you get the mean. ()
    A.df[salary].replace(np.NaN, df[salary].mean().head(10)
    B.dfTSalary|.mean(np.NaN, dfSalary].replace().head(10)
    C.dfSalary].replacenull(np.NaN, df’Salary].mean()).head(10)
    D.df[Salary].replace_null(np.NaN, df[Salary].mean().head(10)

  • —— is the process of gathering, sorting, and transforming data from an original “raw” format, in order to prepare it for analysis and otheldownstream processes.()
    A.Data Acquisition
    B.Exploratory Data Analyais
    C.Data Wrangling
    D.Data Manipulation

  • —— is defined as a process that enables users in data organization in order to make reading or interpret the insights from the data and comprises of havingbetter design.()
    A.Data Acquisition
    B.Exploratory Data Analyais
    c.Date Wrangling
    D.Data Manipulation

  • Find the correct statement :

    Statement A: The value of an independent variable does not change based on the effects of other variables.

    Statement B: The vale of dependent variable changes when there is any change in the values of the independent variables, as mentioned before
    A.Only Statement A is true
    B.0nly Staterent B is true
    C.Both Statement A and Statement B are true
    D.Both Staterent A and Statement B are false

  • When a model has not leamed the patterns in the training data well and is unable to generalize wel on the new data, it is known as ——
    A.Beat Fit
    B.Data Fitting
    C.Under fittting
    D.Over fitting

  • Dimensionality reduction refers to technigues that reduce the number of input —— in a dataset.
    A.variables
    B.columns
    C.rows
    D.dataset

  • Which of the following is considered as Basic Assumptions for Factor Analysis? ()
    A.There are no outliers in data.
    B.Sample aize should be greater than the factor.
    C.There should not be perfect multicollinearity
    D.All of the Abowe

  • Which of the followings is used for standardize features by removing the mean and scaling to unit variance?(
    A.Bartlett’s Test
    B.StandardScaler()
    C.Kaiser-Meyer-0lkin (KMO) Test
    D.Commonalities

  • Which of the following function is used to find the amount of variance explained by each factors?()
    A.standardscaler
    B.Commonalities
    C.loading_
    D.get_factor_variance()

  • Which of the followings is/are used to express the correlation between any two or more attributes in a multidimensional dataset? ()
    A.Standardization
    B.Covariance Matrix
    C.Eigen Vectors and Eigen Values
    D.Feature Vectors

  • Which of the following is a supenvised machine learing technique used to find a linear combination of features that separates two or more classes of objects or events?()
    A.Factor Analysis (FA)
    B.Principal Component Analysis (PCA)
    C.Linear Discriminant Analysis(LDA)
    D.All of the Abowe

  • Find the correct statement .

    Statement A: linear discriminant analysis is a supervised dimensionality reduction technigue that also achieves classification of the data simultaneouslyr

    Statement B. Principal component analysis is an unsupervised dimensionality reduction technique, it ignores the class label.
    A.Only Statement A is true
    B.Only Statement B is true
    C.Both Statement A and Stetement B are true
    D.Both Statement A and Statement B are false

  • Which of the followings is not a classification algorithm? ()
    A.Decision Tree
    B.Random Forest
    C.Naive Bayes
    D.Logistic regression

  • Which among these is not an disadvantage of using decision tree algorithm for classification?()
    A.Overfitting
    B.High Variance
    C.Low Biased Tree
    D.Little data preparation

  • Which of the following in a decision tree carries the final results and cannot be split any further?
    A.Root Node
    B.Decision Node
    C.Leaf Node
    D.Split Node

  • Find the correct statement :

    Statement A: Random Forest is a learning method that operates by constructing multiple decision trees.

    Statement B: The final decision is made based on the majority of the trees and is chosen by the random forest.
    A.Only Statement A is true
    B.0nly Statement B ia true
    C.Both Statement A and Statement B are true
    D.Both Statement A and Statement B are false

  • According to Bayes theorem:
    image-20250521090715626
    what does P(AlB) denotes? ()
    A.Conditional Probability of A given B
    B.Conditional Probability of B given A
    C.Probability of Event A
    D.Probability of Event B

  • Identify the correct steps for the below statement: A Machine Learning system learns from historical data then it- ()

    【C】

    A.	1. receives new data2.builde the prediction models3.predicts the output for it
    B.	1.builds the prediction models2.predicts the output for it3.receives new data
    C.	1.builds the prediction medels2 receives new data3.predicts the output for it
    D.	None of the Above
    
  • Identify the correct supervised learning algorithms fro the below statement.
    In this you divide your customers based on common characteristics - such as demographics or behaviors, so you can marketto those customers more effectivey
    A.Predicting housing prices
    B.Text categorization
    C.Face Detection
    D.Customer Segmentation

  • —— is a data analytics process to understand the data in depth and learn the different data characteristics, often with visual means.
    A.Data Acquisition
    B.Exploratory Data Analyais
    C.Data Wrangling
    D.Data Manipulation

  • You need to check to see the number of missing values in each column and the percentage of missing values they contribute to the dataset.ldenify the corect codeto achieve that. ()

    【B】

    A.	total=df.isnull(.sum(.sort values(ascending=False)missing_data=pd.concat(total,percentlaxia=1 Jkeya= Total Percent])
    B.	total=dfianull0.sum(.sort_values(ascending=False)percent=(dfisnull0.sum/df.isnull(.count().sort_values(ascending=False)missing_data=pd.concat([total,percentlaxis=1,keys=[Total,Percent])
    C.	total=dfisnull(.sum(.sort_valuea(ascending=Falsepercent=(dfisnull().concat(/df.inull(.count()).sort_values(ascending=False)missing_data=pd.sum(ftotal,percent,axis=1.keys= Total',Percent)
    D.	total=df.isnull().sort values(ascending=False)percent=(df.isnull().concat(/df.isnull().count().sort_values(ascending=False)missing_catasoc sumdtote cercent exis=1cevs=[Tote !Pencent]
    
  • Identify the correct code where a column has missing value and you want to replace it with a new category.
    A.df[College].fll(u).head(10)
    B.dfCollege].fll_na(u).head(10)
    C.df[college].fillna(u).head(10)
    D.dfCollege].filnull(u).head(10)

  • How can we access the data values fited in the particular row or column based on the index value passed to the function? ()
    A.Using loc() function
    B.using flter() function
    C.using groupby0 function
    D.None of the Above

  • Find the correct statement :

    Statement A. Classification Suervised Learnino is used when the outout variable is a real or continuous value

    Statement B: Regression is used when the output variable iscategorical.
    A.Only Statement A is true
    B.0nly Statement B is true
    C.Both Statement A and Statement B are true
    D.Both Statement A and Statement B are false

  • In the equation, y = m * x + c, what is denoted as slope of the line. ()

    A.y

    B.m

    C.x

    D.c

  • Which of the folowing is refers to techmiaues that are used to calibrate machine eaming models in order to minimizethe adiusted oss function and prevent overfitting or underftting? ()
    A.Regularization
    B.Logiatic regression
    C.Confusion iatrix
    D.Root Mean Squared Error

  • Which of the following does not consider as an advantage of Dimentionality reduction?
    A.Fewer features mean less complexity
    B.You will need less storage space because you have fewer data
    C.Many features reguire less computation tirne
    D.Model accuracy improves due to less misleading data

  • A —— is a latent variable which describes the association among the number of observed variables.
    A.Factor
    B.Factor loading
    C.Eigenvalues
    D.Communalities

  • While using the loading function the loading score range will be betweeen()
    A1.0
    B.-1.0
    C…1.1
    D.0.-1

  • Which straight line is used to captures most of the variance of the data and they have a direction and magnitude?
    A.Principal Components
    B.Build Covariance Matrix
    C.Eigen Vectors and Eigen Walues
    D.Feature Vectora

  • Which of the folowings is/are the mathematical values that are extracted from the covariance table and they are responsible for the generation of new set of variables from old set of variables which further lead to the construction of principal component?
    AStandardization
    B.Covariance Matrix
    C.Eigen Vectors and Eigen Values
    D.Feature Vectors

  • Which of the following is a limitations of Logistic Regression? ()
    A.Two-Class Problems
    B.Unatable with Well Separated Classes
    C.Unstable with Few Examples
    D.All of the Above

  • Find the correct statement :

    Statement A: Principal component analysis focuses on finding a feature subspace that maximizes the separatability between the groups

    Statement B: Linear discriminant analysis focuses on capturing the direction of maximum variation in the data set.
    A.Only Statement A is true
    B.Only Statement B is true
    C.Both Statement A and Statement B are true
    D.Both Statement A and Statement B are false

  • Classification algorithms are used to classify the data into
    A.group
    B.groups and categories
    C.categoriea
    D.dataset


【课后题】

  • Which among these do not belong to artificial intelligence?

    a. natural language processing

    b. autonomous vehicles

    c. accounting

    d. image recognition

  • —— concept pertains to a machine being more intelligent than a human being.

    a. Artificial Narrow Intelligence

    b. Artificial General Intelligence

    c. Artificial Super Intelligence

    d. None of the Above

  • A _____________ algorithm enables it to identify patterns in observed data, build models that explain the world, and predict things without having explicit pre-programmed rules and models.

    a. Artificial Intelligence

    b. Machine Learning

    c. Deep Learning

    d. None of the Above

  • Which of the following algorithm are used for visual perception tasks, such as object recognition?

    a. Supervised Machine Learning algorithms.

    b. Unsupervised Machine Learning algorithms.

    c. Reinforcement Learning

    d. Classification Model

  • —— is a real-time machine learning application that determines the emotion or opinion of the speaker or the writer.

    a. Product Recommendations

    b. Image recognition

    c. Sentiment Analysis

    d. Language Translation

  • Which of the followings is a stage in data preprocessing?

    a. Exploratory Data Analysis

    b. Data Wrangling

    c. Data Manipulation

    d. All of the Above

  • Which of the following libraries is a Python 2D plotting library that is used to plot any type of charts in Python?.

    a. Numpy

    b. Pandas

    c. Matplotlib

    d. Seaborn

  • Identify the correction option for df.shape command?

    a. It will return the number of columns

    b. It will return the number of rows

    c. It will return the number of column and row

    d. It will return the datatype of column

  • —— is defined as a process that enables users in data organization in order to make reading or interpret the insights from the data and comprises of having better design

    a. Exploratory Data Analysis

    b. Data Wrangling

    c. Data Acquisition

    d. Data Manipulation

  • Which of the following is not a supervised learning algorithm?

    a. k-means clustering

    b. Linear Regression

    c. Logistic Regression

    d. Support Vector Machine

  • —— is used when the output variable is a real or continuous value. In this case, there is a relationship between two or more variables i.e., a change in one variable is associated with a change in the other variable.

    a. Regression

    b. Classification

    c. k-means clustering

    d. Support Vector Machine

  • modifies the over-fitted or under fitted models by adding the penalty equivalent to the sum of the squares of the magnitude of coefficients.

    a. R-squared

    b. Adjusted R-sqaured

    c. Ridge Regularization

    d. Lasso Regularization

  • Find the correct statement:

    Statement A: The value of an independent variable does not change based on the effects of other variables.

    Statement B: The value of dependent variable changes when there is any change in the values of the independent variables, as mentioned before.

    a. Only Statement A is true

    b. Only Statement B is true

    c. Both Statement A and Statement B are true

    d. Both Statement A and Statement B are false

  • The process of plotting a series of data points and drawing the best fit line to understand the relationship between the variables is called ____________.

    a. Underfitting

    b. Overfitting

    c. Data Fitting

    d. None of the Above

  • Which of the following is Dimentionality Reduction Techniques?

    a. Factor Analysis (FA)

    b. Principal Component Analysis (PCA)

    c. Linear Discriminant Analysis (LDA)

    d. All of the above

  • —— are the sum of the squared loadings for each variable and it represents the common variance.

    a. Factor

    b. Factor loading

    c. Eigenvalues

    d. Communalities

  • —— use factor analysis to identify price-sensitive customers, identify brand features that influence consumer choice, and helps in understanding channel selection criteria for the distribution channel.

    a. Advertisers

    b. Market researchers

    c. Psychologist

    d. None of the Above

  • Values close to —— indicate that the factor have influence on these variables.

    a. 0 or -1

    b. -1 or 1

    c. 0 or 1

    d. -1 or 0

  • Which of the followings is/are used to express the correlation between any two or more attributes in a multidimensional dataset?

    a. Standardization

    b. Build Covariance Matrix

    c. Eigen Vectors and Eigen Values

    d. Feature Vectors

  • Which of the followings is not a classification algorithm?

    a. Decision Tree

    b. Random Forest

    c. Naive Bayes

    d. Logistic regression

  • Which of the following is used to determine the correct variable for splitting nodes?

    a. Entropy

    b. Information Gain

    c. Gini Index

    d. Root Node

  • Which of the following is not an advantage of using random forest algorithm?

    a. No Overfitting

    b. High Accuracy

    c. Estimate Missing Data

    d. High Variance

  • Support Vectors are data points that are —— the hyperplane and influence the position and orientation of the hyperplane.

    a. closer to

    b. far from

    c. in between

    d. None of the Above

  • Which of the followings is a Kernel SVM Classifiers?

    a. Linear Kernel

    b. Polynomial Kernel

    c. RBF Kernel

    d. All of the Above

  • Which of the following is not a type of Clustering?

    a. Agglomerative

    b. Divisive

    c. K-Means

    d. Factor Analysis

  • Identify the formula of Squared Euclidean distance when we are deciding the closeness of two clusters(a, b)?

    a. ||a-b||2 = √(Σ(ai-bi))

    b. ||a-b||2 2 = Σ((ai-bi)2)

    c. ||a-b||1 = Σ|ai-bi|

    d. ||a-b||INFINITY = maxi|ai-bi|

  • Which of the following is used to measure the distance in an ordinary straight line?

    a. Euclidean Distance Measure

    b. Squared Euclidean Distance Measure

    c. Manhattan Distance Measure

    d. Cosine Distance Measure

  • —— is a certain sequence of data observations that a system collects within specific periods of time.

    a. Time Series

    b. Regularization

    c. Logistic regression

    d. Confusion Matrix

  • In time series the data has to be stationary. And Stationarity of Time series depends on —— , variance, and covariance.

    a. Mean

    b. Median

    c. Mode

    d. None of the Above

  • Which of the following models predict future behavior using past behavior where there is some correlation between past and future data?

    a. Auto Regressive (AR) Model

    b. Moving Average (MA)

    c. Auto Regressive Moving Average (ARMA)

    d. Auto Regressive Integrated Moving Average (ARIMA)

  • Which of the following includes the modeling of exogenous variables?

    a. ARMA

    b. ARIMA

    c. SARIMA

    d. SARIMAX

  • Which of the following is an example of Ensemble Learning?

    a. Regularization

    b. Logistic regression

    c. Confusion Matrix

    d. Random Forest

  • Which of the following is not a simple Ensemble Learning Method?

    a. Mode

    b. Mean/Average

    c. Weighted Average

    d. Boosting

  • Which of the following is not an advantage of Bagging in Machine Learning?

    a. Minimizes the overfitting of data.

    b. Improves the model’s accuracy.

    c. Deals with higher dimensional data efficiently

    d. Determines the worst, best, and expected values for several scenarios

  • Which of the following boosting techniques is also known as XGBoost?

    a. Adaptive boosting

    b. Gradient boosting

    c. Extreme Gradient Boosting

    d. None of the Above

  • Which of the following is not a Cross Validation model?

    a. K-fold

    b. K-means

    c. Leave One Out

    d. Stratified K-fold

  • Which of the following is not a type of recommendation systems?

    a) Content-Based Recommendation

    b) Collaborative Filtering

    c) Hybrid

    d) Rating-Based Recommendation

  • Find the correct statement:

    Statement A: Content-Based Recommendation recommends items based on similarity measures between users and/or items.

    Statement B: Each method has its strength. It would be best if we can combine all those strengths and provide a better recommendation. This idea leads us to another improvement of the recommendation, which is the hybrid method.

    a) Only Statement A is true

    b) Only Statement B is true

    c) Both Statement A and Statement B are true

    d) Both Statement A and Statement B are false

  • —— finds interesting associations and relationships among large sets of data items. This technique shows how frequently an itemset occurs in a transaction.

    a) Association rule mining

    b) Apriori Algorithm

    c) User-based nearest neighbor

    d) Item-based nearest neighbor

  • Which of the following is an implication expression of the form X -> Y, where X and Y are any 2 item sets?

    a) Support Count

    b) Frequent Item set

    c) Association Rule

    d) None of the Above

  • Which of the following is correct for text mining when automatic extraction of structured data such as entities, entities relationships, and attributes describing entities from an unstructured source?

    a) Information Extraction

    b) Natural Language Processing

    c) Data Mining

    d) Information Retrieval

  • What does NTLK stands for?

    a) Natural Language Toolkit

    b) Non-natural Language Toolkit

    c) Neutral Language Toolkit

    d) New Language Toolkit

  • Which of the following is more accurate as it uses more informed analysis to create groups of words with similar meanings based on the context?

    a) Removing Punctuations

    b) Removal of Frequent Words

    c) Stemming

    d) Lemmatization

相关文章:

  • 【机器学习】支持向量机(SVM)
  • 华为鸿蒙电脑发布,折叠屏怎么选?
  • ToDesk云电脑、并行智算云与顺网云AI支持能力深度实测报告
  • 深度解析 Java 中介者模式:重构复杂交互场景的优雅方案
  • Linux上conda环境安装完全手札
  • JavaScript-DOM-02
  • 遨游科普:三防平板有哪些品牌?哪个品牌值得推荐?
  • 新浪、京东golang一面整理
  • 2025.05.21华为暑期实习机考真题解析第三题
  • ./build/mkfs.jffs2: Command not found
  • 34、React Server Actions深度解析
  • PDF处理控件Aspose.PDF教程:以编程方式将PDF转换为Word
  • Flask 路由装饰器:从 URL 到视图函数的优雅映射
  • 继DeepSeek之后,又一国产模型迎来突破,或将解答手机端AI的疑惑
  • Android Framework开发环境搭建
  • 游戏引擎学习第301天:使用精灵边界进行排序
  • 量子计算模拟:从理论到实践
  • virtualbox选项“启用套嵌vt-x/amd-v“不可用
  • .NET外挂系列:5. harmony 中补丁参数的有趣玩法(下)
  • Android Framework学习八:SystemServer及startService原理
  • 做网站用哪几个端口 比较好/全国分站seo
  • wordpress做文字站/王通seo赚钱培训
  • 做外汇都看那些网站/如何免费找精准客户
  • 东莞虎门建设网站寻找/搜索排名查询
  • 建筑材料市场信息价网/网络搜索引擎优化
  • wordpress url参数/seo 是什么