当前位置: 首页 > news >正文

scikit-surprise 智能推荐模块使用说明

目录

1、前言

2、算法

3、数据集

3.1 three built-in datasets are available:

3.2 Load a dataset from a pandas dataframe.

3.3 Load a dataset from a (custom) file.

3.4 Load a dataset where folds (for cross-validation) are predefined by some files.

4、predict

4.1 SVD & load_builtin("ml-100k")

4.2  KNNBasic&load_builtin("ml-100k")

4.3 BaselineOnly&custom dataset

5 精度评定


1、前言

Surprise,提供一系列内置的智能推荐算法算法和相应的练习数据集。

参考:The model_selection package — Surprise 1 documentation

安装:pip install scikit-surprise -i https://pypi.org/simple

2、算法

The available prediction algorithms are:

random_pred.NormalPredictor

Algorithm predicting a random rating based on the distribution of the training set, which is assumed to be normal.

baseline_only.BaselineOnly

Algorithm predicting the baseline estimate for given user and item.

knns.KNNBasic

A basic collaborative filtering algorithm.

knns.KNNWithMeans

A basic collaborative filtering algorithm, taking into account the mean ratings of each user.

knns.KNNWithZScore

A basic collaborative filtering algorithm, taking into account the z-score normalization of each user.

knns.KNNBaseline

A basic collaborative filtering algorithm taking into account a baseline rating.

matrix_factorization.SVD

The famous SVD algorithm, as popularized by Simon Funk during the Netflix Prize.

matrix_factorization.SVDpp

The SVD++ algorithm, an extension of SVD taking into account implicit ratings.

matrix_factorization.NMF

A collaborative filtering algorithm based on Non-negative Matrix Factorization.

slope_one.SlopeOne

A simple yet accurate collaborative filtering algorithm.

co_clustering.CoClustering

A collaborative filtering algorithm based on co-clustering.

3、数据集

3.1 three built-in datasets are available:

  • The movielens-100k dataset.

  • The movielens-1m dataset.

  • The Jester dataset 2.

Built-in datasets can all be loaded (or downloaded if you haven’t already) using the Dataset.load_builtin() method. Summary:

  • Dataset.load_builtin

    Load a built-in dataset.

classmethod:

load_builtin(name='ml-100k'prompt=True)

eg:

from surprise import accuracy, Dataset, SVD
from surprise.model_selection import train_test_split
# Load the movielens-100k dataset (download it if needed),
data = Dataset.load_builtin("ml-100k")
# sample random trainset and testset
# test set is made of 25% of the ratings.
trainset, testset = train_test_split(data, test_size=0.25)

3.2 Load a dataset from a pandas dataframe.

you can use a custom dataset that is stored in a pandas dataframe.

classmethod:

load_from_df(dfreader)

eg:

import pandas as pd
from surprise import Dataset, NormalPredictor, Reader
from surprise.model_selection import cross_validate
# Creation of the dataframe. Column names are irrelevant.
ratings_dict = {
    "itemID": [1, 1, 1, 2, 2],
    "userID": [9, 32, 2, 45, "user_foo"],
    "rating": [3, 2, 4, 3, 1],
}
df = pd.DataFrame(ratings_dict)
# A reader is still needed but only the rating_scale param is required.
reader = Reader(rating_scale=(1, 5))
# The columns must correspond to user id, item id and ratings (in that order).
data = Dataset.load_from_df(df[["userID", "itemID", "rating"]], reader)

3.3 Load a dataset from a (custom) file.

classmethod:

load_from_file(file_pathreader)[source]¶

Use this if you want to use a custom dataset and all of the ratings are stored in one file. You will have to split your dataset using the split method. 

Parameters:

  • file_path (string) – The path to the file containing ratings.

  • reader (Reader) – A reader to read the file.

eg:

import os
from surprise import BaselineOnly, Dataset, Reader
from surprise.model_selection import cross_validate
# path to dataset file
file_path = os.path.expanduser("~/.surprise_data/ml-100k/ml-100k/u.data")
# As we're loading a custom dataset, we need to define a reader. In the
# movielens-100k dataset, each line has the following format:
# 'user item rating timestamp', separated by '\t' characters.
reader = Reader(line_format="user item rating timestamp", sep="\t")
data = Dataset.load_from_file(file_path, reader=reader)

3.4 Load a dataset where folds (for cross-validation) are predefined by some files.

classmethod:

load_from_folds(folds_filesreader)

The purpose of this method is to cover a common use case where a dataset is already split into predefined folds, such as the movielens-100k dataset which defines files u1.base, u1.test, u2.base, u2.test, etc… It can also be used when you don’t want to perform cross-validation but still want to specify your training and testing data (which comes down to 1-fold cross-validation anyway). 

Parameters:

  • folds_files (iterable of tuples) – The list of the folds. A fold is a tuple of the form (path_to_train_file, path_to_test_file).

  • reader (Reader) – A reader to read the files.

class surprise.dataset.DatasetAutoFolds(ratings_file=Nonereader=Nonedf=None)

A derived class from Dataset for which folds (for cross-validation) are not predefined. (Or for when there are no folds at all).

build_full_trainset()

Do not split the dataset into folds and just return a trainset as is, built from the whole dataset.

User can then query for predictions.

4、predict

4.1 SVD & load_builtin("ml-100k")

from surprise import accuracy, Dataset, SVD
from surprise.model_selection import train_test_split
# Load the movielens-100k dataset (download it if needed),
data = Dataset.load_builtin("ml-100k")
# sample random trainset and testset
# test set is made of 25% of the ratings.
trainset, testset = train_test_split(data, test_size=0.25)
# We'll use the famous SVD algorithm.
algo = SVD()
# Train the algorithm on the trainset, and predict ratings for the testset
algo.fit(trainset)
predictions = algo.test(testset)  #predict 参数为数据集

accuracy.rmse(predictions)   #精度评定

algo.predict(uid,iid,u_r)  # predict( a single sample)单个的样本

4.2  KNNBasic&load_builtin("ml-100k")

from surprise import Dataset, KNNBasic
# Load the movielens-100k dataset
data = Dataset.load_builtin("ml-100k")
# Retrieve the trainset.
trainset = data.build_full_trainset()
# Build an algorithm, and train it.
algo = KNNBasic()
algo.fit(trainset)

#algo.test()

#algo.predict(uuid,iid)

4.3 BaselineOnly&custom dataset

import os
from surprise import BaselineOnly, Dataset, Reader
from surprise.model_selection import train_test_split
# path to dataset file
file_path = os.path.expanduser("~/.surprise_data/ml-100k/ml-100k/u.data")
# As we're loading a custom dataset, we need to define a reader. In the
# movielens-100k dataset, each line has the following format:
# 'user item rating timestamp', separated by '\t' characters.
reader = Reader(line_format="user item rating timestamp", sep="\t")
data = Dataset.load_from_file(file_path, reader=reader)

trainset, testset = train_test_split(data, test_size=0.25)

algo=BaselineOnly()

predictions=algo.fit(trainset).test(testset)

#algo.predict(uid,iid)

5 精度评定

Available accuracy metrics:

rmse

Compute RMSE (Root Mean Squared Error).

mse

Compute MSE (Mean Squared Error).

mae

Compute MAE (Mean Absolute Error).

fcp

Compute FCP (Fraction of Concordant Pairs).

accuracy.rmse(predictions, verbose=True)  #精度评定(rmse)

accuracy.mae(predictions,verbose=True)

accuracy.mse(predictions,verbose=True)



相关文章:

  • 简单视图函数
  • (BFS)题解:P9425 [蓝桥杯 2023 国 B] AB 路线
  • 智能打印预约系统:微信小程序+SSM框架实战项目
  • 机器学习的一百个概念(6)最小最大缩放
  • Codeforces Round #1014 (Div. 2)
  • 三路排序算法
  • 本科lw指导
  • 鸿蒙NEXT开发Base64工具类(ArkTs)
  • 消息队列--RocketMQ
  • DeepSeek 助力 Vue3 开发:打造丝滑的表格(Table)之添加行拖拽排序功能示例13,TableView16_13 键盘辅助拖拽示例
  • 【算法】快速幂
  • 6内存泄露问题的讨论
  • MySQL其他客户端程序
  • 边缘计算:工业自动化的智能新引擎
  • 低成本文件共享解决方案:Go File本地Docker部署与外网访问全记录
  • 小米平板 4 Plus 玩机日志
  • Xvfb和VNC Server是什么
  • 使用自定义的RTTI属性对对象进行流操作
  • 7对象树(1)
  • 文本分析(非结构化数据挖掘)——特征词选择(基于TF-IDF权值)
  • 门户网站是什么意思啊/百度seo关键词优化
  • 代理网站推荐/59软文网
  • 华蓥网站建设/google学术搜索
  • 南昌优秀网站建设/保定网站建设方案优化
  • 泸州网站建设兼职/我为什么不建议年轻人做运营
  • 室内设计网站 知乎/最有效的线上推广方式