当前位置: 首页 > news >正文

ML4T - 第8章第0节 数据准备Data prep

目录

一、获取 Quandl 调整后行情

二、获取 Lasso 预测值

三、统一索引名 & 确定时间窗口

四、对齐行情 & 预测

五、整体代码


这个脚本的作用:(准备阶段)

脚本把“Lasso 预测值”与“Quandl 调整后行情”按最佳超参、统一日期-股票对齐,生成可直接用于策略回测的完整数据表 08_backtest.h5

一、获取 Quandl 调整后行情

with pd.HDFStore(DATA_DIR / 'assets.h5') as store:

prices = (store['quandl/wiki/prices']

.filter(like='adj')

.rename(columns=lambda x: x.replace('adj_', ''))

.swaplevel(axis=0))

二、获取 Lasso 预测值

# with pd.HDFStore(PROJECT_DIR / '07_linear_models/data.h5') as store:

with pd.HDFStore(PROJECT_DIR / 'data.h5') as store:

print(store.info())

predictions = store[predictions]

# 用 Spearman 秩相关系数 衡量“预测值”与“真实值”的单调性,挑整体表现最好的 alpha。

best_alpha = predictions.groupby('alpha').apply(lambda x: spearmanr(x.actuals, x.predicted)[0]).idxmax()

predictions = predictions[predictions.alpha == best_alpha]

三、统一索引名 & 确定时间窗口

predictions.index.names = ['ticker', 'date']

tickers = predictions.index.get_level_values('ticker').unique()

start = predictions.index.get_level_values('date').min().strftime('%Y-%m-%d')

stop = (predictions.index.get_level_values('date').max() + pd.DateOffset(1)).strftime('%Y-%m-%d')

四、对齐行情 & 预测

idx = pd.IndexSlice

prices = prices.sort_index().loc[idx[tickers, start:stop], :]

predictions = predictions.loc[predictions.alpha == best_alpha, ['predicted']]

return predictions.join(prices, how='right')

五、整体代码

#!/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'Stefan Jansen'  # https://github.com/stefan-jansen/machine-learning-for-trading/blob/main/08_ml4t_workflow/00_data/data_prep.py
__modified_author__ = 'MangoQuant'  # https://blog.csdn.net/2401_82851462from pathlib import Path
import numpy as np
import pandas as pd
from scipy.stats import spearmanrpd.set_option('display.expand_frame_repr', False)
np.random.seed(42)# PROJECT_DIR = Path('..', '..')
PROJECT_DIR = Path('.')DATA_DIR = PROJECT_DIR / 'data'def get_backtest_data(predictions='lasso/predictions'):"""Combine chapter 7 lr/lasso/ridge regression predictionswith adjusted OHLCV Quandl Wiki data"""# 获取 Quandl 调整后行情with pd.HDFStore(DATA_DIR / 'assets.h5') as store:prices = (store['quandl/wiki/prices'].filter(like='adj').rename(columns=lambda x: x.replace('adj_', '')).swaplevel(axis=0))# 获取 Lasso 预测值# with pd.HDFStore(PROJECT_DIR / '07_linear_models/data.h5') as store:with pd.HDFStore(PROJECT_DIR / 'data.h5') as store:print(store.info())predictions = store[predictions]# 用 Spearman 秩相关系数 衡量“预测值”与“真实值”的单调性,挑整体表现最好的 alpha。best_alpha = predictions.groupby('alpha').apply(lambda x: spearmanr(x.actuals, x.predicted)[0]).idxmax()predictions = predictions[predictions.alpha == best_alpha]# 统一索引名 & 确定时间窗口predictions.index.names = ['ticker', 'date']tickers = predictions.index.get_level_values('ticker').unique()start = predictions.index.get_level_values('date').min().strftime('%Y-%m-%d')stop = (predictions.index.get_level_values('date').max() + pd.DateOffset(1)).strftime('%Y-%m-%d')# 对齐行情 & 预测idx = pd.IndexSliceprices = prices.sort_index().loc[idx[tickers, start:stop], :]predictions = predictions.loc[predictions.alpha == best_alpha, ['predicted']]return predictions.join(prices, how='right')df = get_backtest_data('lasso/predictions')
print(df.info())
# df.to_hdf('backtest.h5', 'data')
df.to_hdf('08_backtest.h5', 'data')
print("08_backtest.h5 saved")

运行后结果:

test1@budas-MacBook-Pro ML4T % python 08_00_data_prep.py 
<class 'pandas.io.pytables.HDFStore'>
File path: data.h5
/lasso/coeffs                    frame        (shape->[8,33])      
/lasso/predictions               frame        (shape->[590496,3])  
/lasso/scores                    frame        (shape->[6000,3])    
/logistic/coeffs                 frame        (shape->[11,33])     
/logistic/predictions            frame        (shape->[811932,4])  
/logistic/scores                 frame        (shape->[825,5])     
/lr/predictions                  frame        (shape->[73812,2])   
/lr/scores                       frame        (shape->[750,2])     
/model_data                      frame        (shape->[3566454,69])
/ridge/coeffs                    frame        (shape->[18,33])     
/ridge/predictions               frame        (shape->[1328616,3]) 
/ridge/scores                    frame        (shape->[13500,3])   
/Users/test1/Documents/code/my_develop/leader-follower-strategy/ML4T/08_00_data_prep.py:37: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.best_alpha = predictions.groupby('alpha').apply(lambda x: spearmanr(x.actuals, x.predicted)[0]).idxmax()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 198266 entries, ('AAPL', Timestamp('2014-12-09 00:00:00')) to ('MPC', Timestamp('2017-11-30 00:00:00'))
Data columns (total 6 columns):#   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  0   predicted  73812 non-null   float641   open       198266 non-null  float642   high       198266 non-null  float643   low        198266 non-null  float644   close      198266 non-null  float645   volume     198266 non-null  float64
dtypes: float64(6)
memory usage: 14.6+ MB
None
/Users/test1/Documents/code/my_develop/leader-follower-strategy/ML4T/08_00_data_prep.py:56: FutureWarning: Starting with pandas version 3.0 all arguments of to_hdf except for the argument 'path_or_buf' will be keyword-only.df.to_hdf('08_backtest.h5', 'data')
08_backtest.h5 saved

http://www.dtcms.com/a/439603.html

相关文章:

  • 健康预测模型“AI听诊器”的Python编程探索:AI在健康预测/预防阶段的编程示例
  • 男和男做那个视频网站好深圳朝阳电子网站建设
  • 魔方财务对接rainyun教程(2025最新)!
  • 枣阳网站开发英文站用wordpress
  • Dify图文回复
  • 崇安区网站建设价格视觉设计包括
  • 第6篇|机械与电气系统设计要点:让“会转的高楼”既能跑、也会自保
  • 郑州网站建设庄园长治电子商务网站建设
  • Linux系统编程 -- 操作系统概念
  • 做网站网站名字自己设置吗厦门网络推广推荐
  • 微信网站设计做网站的收钱不管了
  • 中国广东网站建设wordpress 互动性
  • 临沂网站制作公司6安阳建设网站哪家好
  • 优秀设计赏析网站微信小程序开发技术栈
  • 韩国网站域名设计师个人作品展示网站
  • 高级网站开发培训价格4s店建设网站的目的
  • 重庆网站设计总部wordpress全屏背景
  • 基于STM32单片机大棚温湿度检测无线蓝牙APP控制设计
  • 团购网站APP怎么做深圳自适应网站建设价格
  • 青海市住房和城乡建设厅网站网站文章快速被收录
  • 网站开发教育类网站建设启示
  • 网站信息系统建设杭州市建设工程招标网
  • 郑州网站建设优化郑州哪家建设网站
  • wordpress企业建站教程兰州微信信息平台网站建设
  • Wordpress 仿站 工具腾讯oss wordpress
  • 浚县网站建设深圳企业网络推广公司
  • 网站联盟名词解释苏州网络推广seo服务
  • 建设行业网站价格广州市城乡住房建设厅网站
  • 海南网站建设推荐怎么样百度搜到自己的网站
  • 福州专门做网站电商设计师岗位职责