当前位置: 首页 > news >正文

ML4T - 第7章第6节 使用Alphalens进行分析 Alphalens Analysis

目录

一、Load Data 加载数据

二、Linear Regression 线性回归

1. Quantiles Statistics(分组统计)

2. Returns Analysis(收益分析)

3. Information Analysis(IC 分析)

4. Turnover Analysis(换手率)

5. Rank Autocorrelation(因子稳定性)

三、Ridge Regression 岭回归

四、Lasso Regression  套索回归


参考:https://github.com/stefan-jansen/machine-learning-for-trading/blob/main/07_linear_models/06_evaluating_signals_using_alphalens.ipynb

一、Load Data 加载数据

首先需要安装对应的库

! pip install alphalens-reloaded

# 原版已不再维护,推荐使用“社区重载版(alphalens-reloaded)”

import warnings
warnings.filterwarnings('ignore')from pathlib import Path
import pandas as pd
from alphalens.tears import create_summary_tear_sheet
from alphalens.utils import get_clean_factor_and_forward_returnsidx = pd.IndexSlicewith pd.HDFStore('data.h5') as store:lr_predictions = store['lr/predictions']lasso_predictions = store['lasso/predictions']lasso_scores = store['lasso/scores']ridge_predictions = store['ridge/predictions']ridge_scores = store['ridge/scores']# DATA_STORE = Path('..', 'data', 'assets.h5')
DATA_STORE = Path('data', 'assets.h5')  # change to the real path 修改到真实路径def get_trade_prices(tickers, start, stop):prices = (pd.read_hdf(DATA_STORE, 'quandl/wiki/prices').swaplevel().sort_index())prices.index.names = ['symbol', 'date']prices = prices.loc[idx[tickers, str(start):str(stop)], 'adj_open']return (prices.unstack('symbol').sort_index().shift(-1).tz_localize('UTC'))def get_best_alpha(scores):return scores.groupby('alpha').ic.mean().idxmax()def get_factor(predictions):return (predictions.unstack('symbol').dropna(how='all').stack().tz_localize('UTC', level='date').sort_index())   

注意文件路径要选对:数据来源于前面几节

# DATA_STORE = Path('..', 'data', 'assets.h5')

DATA_STORE = Path('data', 'assets.h5') # change to the real path 修改到真实路径

get_trade_prices 函数

功能:从 HDF5 数据存储中读取股票的历史价格数据,并返回指定时间段和调整后的开盘价。

get_best_alpha 函数

功能:从一组分数中选择最佳的 alpha 值。

get_factor 函数

功能:处理预测数据,将其转换为因子格式,并进行一些数据清洗和格式化。

二、Linear Regression 线性回归

lr_factor = get_factor(lr_predictions.predicted.swaplevel())
lr_factor.head()

tickers = lr_factor.index.get_level_values('symbol').unique()trade_prices = get_trade_prices(tickers, 2014, 2017)
trade_prices.info()

lr_factor_data = get_clean_factor_and_forward_returns(factor=lr_factor,prices=trade_prices,quantiles=5,periods=(1, 5, 10, 21))
lr_factor_data.info()

结果:

Dropped 0.0% entries from factor data: 0.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 73812 entries, (Timestamp('2014-12-09 00:00:00+0000', tz='UTC'), 'AAL') to (Timestamp('2017-11-29 00:00:00+0000', tz='UTC'), 'XOM')
Data columns (total 6 columns):#   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  0   1D               73812 non-null  float641   5D               73812 non-null  float642   10D              73812 non-null  float643   21D              73812 non-null  float644   factor           73812 non-null  float645   factor_quantile  73812 non-null  int64  
dtypes: float64(5), int64(1)
memory usage: 3.7+ MB

画图并分析

Alphalens 的 create_summary_tear_sheet 负责把单个因子拆成 5 组(quantiles)做业绩/IC/换手分析

create_summary_tear_sheet(lr_factor_data);
# 说明可以参考:alphalens入门篇  https://blog.csdn.net/u011331731/article/details/88314459
# 分析结论:短线有点信号,长线更稳,但都不强; turnover 高,费用吃利润,适合当排序信号,不适合单独对冲。

Quantiles Statistics

minmaxmeanstdcountcount %
factor_quantile
1-0.0437570.009896-0.0029800.0040341498220.297513
2-0.0144290.012193-0.0009010.0032431466619.869398
3-0.0123090.0138430.0001690.0032281451619.666179
4-0.0111090.0160010.0011880.0033521466619.869398
5-0.0095760.0357340.0031440.0040321498220.297513

Returns Analysis

1D5D10D21D
Ann. alpha0.0480.0170.0190.020
beta-0.017-0.068-0.0580.041
Mean Period Wise Return Top Quantile (bps)1.8421.0160.5140.778
Mean Period Wise Return Bottom Quantile (bps)-1.847-0.613-0.910-1.259
Mean Period Wise Spread (bps)3.6891.6521.4332.028

Information Analysis

1D5D10D21D
IC Mean0.0190.0170.0190.023
IC Std.0.1780.1650.1720.156
Risk-Adjusted IC0.1070.1000.1110.148
t-stat(IC)2.9402.7453.0534.045
p-value(IC)0.0030.0060.0020.000
IC Skew-0.0930.031-0.158-0.142
IC Kurtosis-0.212-0.053-0.107-0.231

Turnover Analysis

1D5D10D21D
Quantile 1 Mean Turnover0.3000.5270.6300.747
Quantile 2 Mean Turnover0.5160.7050.7580.797
Quantile 3 Mean Turnover0.5600.7390.7730.807
Quantile 4 Mean Turnover0.5150.7040.7560.789
Quantile 5 Mean Turnover0.3020.5300.6370.741
1D5D10D21D
Mean Factor Rank Autocorrelation0.8170.5510.4010.236

<Figure size 640x480 with 0 Axes>

把单个因子拆成 5 组进行分析,通俗的例子就是:

因子值是“今天给学生按某科成绩排队”,
收益是“过几天再看他们总分涨了多少”,
拆 5 组就是“看看排在前面的学生是不是涨得最多”。

分析结论:短线有点信号,长线更稳,但都不强; turnover 高,费用吃利润,适合当排序信号,不适合单独对冲。

1. Quantiles Statistics(分组统计)

单调性良好:因子值从 Q1 到 Q5 递增,说明因子方向正确,没有反转

2. Returns Analysis(收益分析)

alpha 低(<5%),beta 接近 0 ;市场中性,但超额收益也不高

多空对冲收益(spread) 1D 最高,但 5D/10D 下降,21D 略有回升。
收益不大,1D 的 3.7bps 扣掉交易成本(双边 ~2bps+滑点)后,净利很薄

3. Information Analysis(IC 分析)

21D 的 IC 最高,t-stat 最显著(>4),说明因子在月度频率上更稳定
但 IC 绝对值仍低于 0.03,属于“弱信号”,不能单独作为策略核心

4. Turnover Analysis(换手率)

1D 换手率太高,双边 60%,扣掉费用后利润几乎被吃光。
21D 换手率相对可控,更适合实盘。

5. Rank Autocorrelation(因子稳定性)

短期因子排名稳定,适合短周期预测;
长期排名变化快,说明因子衰减快,不适合长周期持仓。

三、Ridge Regression 岭回归

类似的:

best_ridge_alpha = get_best_alpha(ridge_scores)
ridge_predictions = ridge_predictions[ridge_predictions.alpha==best_ridge_alpha].drop('alpha', axis=1)ridge_factor = get_factor(ridge_predictions.predicted.swaplevel())
ridge_factor.head()ridge_factor_data = get_clean_factor_and_forward_returns(factor=ridge_factor,prices=trade_prices,quantiles=5,periods=(1, 5, 10, 21))
ridge_factor_data.info()create_summary_tear_sheet(ridge_factor_data);

结果:

Dropped 0.0% entries from factor data: 0.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 73812 entries, (Timestamp('2014-12-09 00:00:00+0000', tz='UTC'), 'AAL') to (Timestamp('2017-11-29 00:00:00+0000', tz='UTC'), 'XOM')
Data columns (total 6 columns):#   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  0   1D               73812 non-null  float641   5D               73812 non-null  float642   10D              73812 non-null  float643   21D              73812 non-null  float644   factor           73812 non-null  float645   factor_quantile  73812 non-null  int64  
dtypes: float64(5), int64(1)
memory usage: 3.7+ MB

Quantiles Statistics

minmaxmeanstdcountcount %
factor_quantile
1-0.0374860.010285-0.0032050.0036441498220.297513
2-0.0117730.012590-0.0012420.0030031466619.869398
3-0.0098600.014102-0.0002300.0030231451619.666179
4-0.0088020.0161300.0007390.0031751466619.869398
5-0.0073850.0351240.0025760.0038671498220.297513

Returns Analysis

1D5D10D21D
Ann. alpha0.0480.0200.0220.020
beta-0.018-0.074-0.0650.038
Mean Period Wise Return Top Quantile (bps)1.6860.9470.3530.654
Mean Period Wise Return Bottom Quantile (bps)-2.010-0.639-1.074-1.285
Mean Period Wise Spread (bps)3.6961.6121.4411.937

Information Analysis

1D5D10D21D
IC Mean0.0190.0170.0200.021
IC Std.0.1790.1670.1740.156
Risk-Adjusted IC0.1080.1030.1140.137
t-stat(IC)2.9522.8293.1103.748
p-value(IC)0.0030.0050.0020.000
IC Skew-0.1050.011-0.160-0.149
IC Kurtosis-0.175-0.024-0.091-0.244

Turnover Analysis

1D5D10D21D
Quantile 1 Mean Turnover0.2940.5140.6190.739
Quantile 2 Mean Turnover0.5070.6970.7520.795
Quantile 3 Mean Turnover0.5540.7330.7730.804
Quantile 4 Mean Turnover0.5090.6980.7570.786
Quantile 5 Mean Turnover0.2960.5200.6280.736
1D5D10D21D
Mean Factor Rank Autocorrelation0.8220.5690.4170.247

<Figure size 640x480 with 0 Axes>

分析:

预测力:跟普通线性回归几乎打平,IC 同样“刚踩线”(0.019→0.021),t-stat 也没飞起来。

赚钱力:1D 对冲 spread 3.7 bps,跟 LR 一样“薄如纸”,扣完双边成本只剩 1 bps 左右。

稳定性:Rank 自相关更高(0.82),换手略低一点,说明岭回归“平滑”后,因子排名短期更耐操,但长周期照样衰减。

总结:岭回归只是“把 LR 的毛刺磨平”,没长出新增信息;成本端省 1 bps,收益端零提升,可留作候选

四、Lasso Regression  套索回归

代码,类似地:

best_lasso_alpha = get_best_alpha(lasso_scores)
lasso_predictions = lasso_predictions[lasso_predictions.alpha==best_lasso_alpha].drop('alpha', axis=1)lasso_factor = get_factor(lasso_predictions.predicted.swaplevel())
lasso_factor.head()lasso_factor_data = get_clean_factor_and_forward_returns(factor=lasso_factor,prices=trade_prices,quantiles=5,periods=(1, 5, 10, 21))
lasso_factor_data.info()create_summary_tear_sheet(lasso_factor_data);

结果:

Dropped 0.0% entries from factor data: 0.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 73812 entries, (Timestamp('2014-12-09 00:00:00+0000', tz='UTC'), 'AAL') to (Timestamp('2017-11-29 00:00:00+0000', tz='UTC'), 'XOM')
Data columns (total 6 columns):#   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  0   1D               73812 non-null  float641   5D               73812 non-null  float642   10D              73812 non-null  float643   21D              73812 non-null  float644   factor           73812 non-null  float645   factor_quantile  73812 non-null  int64  
dtypes: float64(5), int64(1)
memory usage: 3.7+ MB

Quantiles Statistics

minmaxmeanstdcountcount %
factor_quantile
1-0.0439250.010653-0.0034700.0040041498220.297513
2-0.0134940.012950-0.0013830.0032651466619.869398
3-0.0113730.014600-0.0003270.0032841451619.666179
4-0.0101740.0167580.0007060.0034311466619.869398
5-0.0086410.0358910.0026540.0041641498220.297513

Returns Analysis

1D5D10D21D
Ann. alpha0.0480.0170.0190.020
beta-0.017-0.068-0.0580.041
Mean Period Wise Return Top Quantile (bps)1.8421.0160.5140.778
Mean Period Wise Return Bottom Quantile (bps)-1.847-0.613-0.910-1.259
Mean Period Wise Spread (bps)3.6891.6521.4332.028

Information Analysis

1D5D10D21D
IC Mean0.0190.0170.0190.023
IC Std.0.1780.1650.1720.156
Risk-Adjusted IC0.1070.1000.1110.148
t-stat(IC)2.9402.7443.0534.045
p-value(IC)0.0030.0060.0020.000
IC Skew-0.0930.031-0.158-0.142
IC Kurtosis-0.212-0.053-0.107-0.231

Turnover Analysis

1D5D10D21D
Quantile 1 Mean Turnover0.3000.5270.6300.747
Quantile 2 Mean Turnover0.5160.7050.7580.797
Quantile 3 Mean Turnover0.5600.7390.7730.807
Quantile 4 Mean Turnover0.5150.7040.7560.789
Quantile 5 Mean Turnover0.3020.5300.6370.741
1D5D10D21D
Mean Factor Rank Autocorrelation0.8170.5510.4010.236

<Figure size 640x480 with 0 Axes>

我的分析:

数字几乎跟普通线性回归重合,

Lasso 在这里只是“ LR 的影分身”,没有把任何变量真的压到 0,预测力、赚钱力、换手、衰减曲线几乎 1:1 复刻;正则化没起作用,说明原始特征里本来就没有明显冗余或共线

http://www.dtcms.com/a/431008.html

相关文章:

  • Lazarus下载和安装教程(附安装包,图文并茂)
  • 高端自适应网站设计深圳市年年卡网络科技公司是什么
  • 安卓基础组件017--ImageView组件
  • wordpress 设置多域名 一个站点wordpress个人主页主题
  • 少儿舞蹈小程序(21)我的页面搭建
  • 小程序前端功能更新说明
  • Cartograph+explore_lite未知地形建图
  • FileLocator Pro(文件搜索工具) 多语便携版
  • 兼职做任务的网站免费网站下载直播软件免费
  • RabbitMQ死信交换机:消息的“流放之地“
  • LeetCode每日一题——加1
  • BriLLM框架研究可行性分析
  • 苏州工程网站建设wordpress导航菜单最右边
  • Java SE “泛型 + 注解 + 反射”面试清单(含超通俗生活案例与深度理解)
  • 22408计算机网络(初学)
  • 关于docker pull不了相关资源
  • OSPF Authentication-mode 概念
  • 网站怎么搭建在线编程网站开发
  • 以江协科技STM32入门教程的方式打开FreeRTOS——STM32C8T6如何移植FreeRTOS
  • 企业建设网站有哪些费用网站设计培训学院
  • ORB_SLAM2原理及代码解析:Frame::UnprojectStereo() 函数
  • SLAM算法分类对比
  • 碎片笔记|生成模型原理解读:AutoEncoder、GAN 与扩散模型图像生成机制
  • 中文粤语(广州)语音语料库:6219条高质量语音数据助力粤语语音识别与自然语言处理研究
  • Kubernetes HTTPS迁移:Ingress到GatewayAPI实战
  • [Power BI] 矩阵表
  • 陕西省建设厅网站劳保统筹基金网站建设合同需要注意什么
  • 【多线程】——基础篇
  • 多语言网站 自助洛阳兼职网站
  • 【C++实战(61)】C++ 并发编程实战:解锁线程池的奥秘与实现