当前位置: 首页 > news >正文

[MDM 2024]Spatial-Temporal Large Language Model for Traffic Prediction

论文网址:[2401.10134] Spatial-Temporal Large Language Model for Traffic Prediction

论文代码:GitHub - ChenxiLiu-HNU/ST-LLM: Official implementation of the paper "Spatial-Temporal Large Language Model for Traffic Prediction"

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.3.1. Large Language Models for Time Series Analysis

2.3.2. Traffic Prediction

2.4. Problem Definition

2.5. Methodology

2.5.1. Overview

2.5.2. Spatial-Temporal Embedding and Fusion

2.5.3. Partially Frozen Attention (PFA) LLM

2.6. Experiments

2.6.1. Datasdets

2.6.2. Baselines

2.6.3. Implementations

2.6.4. Evaluation Metrics

2.6.5. Main Results

2.6.6. Performance of ST-LLM and Ablation Studies

2.6.7. Parameter Analysis

2.6.8. Inference Time Analysis

2.6.9. Few-Shot Prediction

2.6.10. Zero-Shot Prediction

2.7. Conclusion

3. Reference


1. 心得

(1)尽管几天后要投的论文还没开始写,仍然嚼嚼饼干写写阅读笔记。哎。这年头大家都跑得太快了

(2)比起数学,LLM适合配一杯奶茶读,全程轻松愉悦,这一篇就是分开三个卷积→合在一起→LLM(部分解冻一些模块)→over

2. 论文逐段精读

2.1. Abstract

        ①They proposed Spatial-Temporal Large Language Model (ST-LLM) to predict traffic(好像没什么特别的我就不写了,就是在介绍方法,说以前的精度不高。具体方法看以下图吧)

2.2. Introduction

        ①Traditional CNN and RNN cannot capture complex/long range spatial and temporal dependencies. GNNs are prone to overfitting, thus reseachers mainly use attention mechanism.

        ②Existing traffic prediction methods mainly focus on temporal feature rather than spatial

        ③For better long term prediction, they proposed partially frozen attention (PFA)

2.3. Related Work

2.3.1. Large Language Models for Time Series Analysis

        ①Listing TEMPO-GPT, TIME-LLM, OFA, TEST, and LLM-TIME, which all utilize temporal feature only. However, GATGPT, which introduced spatial feature, ignores temporal dependencies.

imputation  n.归责;归罪;归咎;归因

2.3.2. Traffic Prediction

        ①Filter is a common and classic method for processing traffic data

        ②Irrgular city net makes CNN hard to apply or extract spatial feature

2.4. Problem Definition

        ①Input traffic data: \mathbf{X}\in\mathbb{R}^{T\times N\times C}, where T denotes timesteps, N denotes numberof spatial stations, C denotes feature

        ②Task: given historical traffic data \mathbf{X}_{P}=\{\mathbf{X}_{t-P+1},\mathbf{X}_{t-P+2},\ldots,\mathbf{X}_{t}\}\in\mathbb{R}^{P\times N\times C} of P time steps only, learning a function f\left ( \cdot \right ) with parameter \theta to predict future S timesteps: \mathbf{Y}_{S}=\{\mathbf{Y}_{t+1},\mathbf{Y}_{t+2},\ldots,\mathbf{Y}_{t+S}\}\in\mathbb{R}^{S\times N\times C}:

[\mathbf{X}_{t-P+1},\mathbf{X}_{t-P+2},\ldots,\mathbf{X}_{t}]\xrightarrow{f(\cdot)}[\mathbf{Y}_{t+1},\mathbf{Y}_{t+2},\ldots,\mathbf{Y}_{t+S}]

2.5. Methodology

2.5.1. Overview

        ①Overall framework of ST-LLM:

where Spatial-Temporal Embedding layer extracts timesteps \mathbf{E}_{T}\in\mathbb{R}^{N\times D}, spatial embedding \mathbf{E}_{S}\in\mathbb{R}^{N\times D}, and temporal embedding \mathbf{E}_{P}\in\mathbb{R}^{N\times D} of historical P timesteps. Then, they three are combined to \mathbf{E}_{F}\in\mathbb{R}^{N\times3D}. Freeze first F layers and preserve last U layers in PFA LLM and get output \mathbf{H}^{L}\in\mathbb{R}^{N\times3D}. Lastly, regresion convolution convert it to \widehat{\mathbf{Y}}_{S}\in\mathbb{R}^{S\times N\times C}.

2.5.2. Spatial-Temporal Embedding and Fusion

        ①They get tokens by pointwise convolution:

\mathbf{E}_{P}=PConv(\mathbf{X}_{P};\theta_{p})

        ②Applying linear layer to encode input \mathbf{X}_P\in\mathbb{R}^{P\times N\times C} to day \mathbf{X}_{day}\in\mathbb{R}^{N\times T_{d}} and week \mathbf{X}_{week}\in\mathbb{R}^{N\times T_{w}}:

E_T^d = W_{day}(X_{day}), \\ E_T^w = W_{week}(X_{week}), \\ E_T = E_T^d + E_T^w.

where \mathbf{W}_{day}\in\mathbb{R}^{T_{d}\times D} and \mathbf{W}_{week}\in\mathbb{R}^{T_{w}\times D} are learnable parameter and the output is \mathbf{E}_{T}\in\mathbb{R}^{N\times D}

        ③They extract spatial correlations by:

\mathbf{E}_S=\sigma(\mathbf{W}_s\cdot\mathbf{X}_\mathbf{P}+\mathbf{b}_s)

        ④Fusion convolution:

\mathbf{H}_F=FConv(\mathbf{E}_P||\mathbf{E}_S||\mathbf{E}_T;\theta_f)

where \mathbf{H}_{F}\in\mathbb{R}^{N\times3D}

2.5.3. Partially Frozen Attention (PFA) LLM

        ①They freeze the first F layers (including multihead attention and feed-forward layers) which contains important information:

\mathbf{\bar{H}}^{i}=MHA\left(LN\left(\mathbf{H}^{i}\right)\right)+\mathbf{H}^{i},\\\mathbf{H}^{i+1}=FFN\left(LN\left(\mathbf{\bar{H}}^{i}\right)\right)+\mathbf{\bar{H}}^{i},

where i \in \left \{ 1,F-1 \right \}\mathbf{H}^{1}=[\mathbf{H}_{F}+\mathbf{P}\mathbf{E}]\mathrm{PE} denotes learnable positional encoding, \mathbf{\bar{H}}^{i} represents the intermediate representation of the i-th layer after applying the frozen multi-head attention (MHA) and the first unfrozen layer normalization (LN), \mathbf{H}^{i} symbolizes the final representation after applying the unfrozen LN and frozen feed-forward network (FFN), and:

LN \left( \mathbf { H } ^ { i } \right) = \gamma \odot \frac { \mathbf { H } ^ { i } - \mu } { \sigma } + \beta ,\\ MHA ( \tilde { \mathbf { H } } ^ { i } ) = \mathbf { W } ^ { O } ( \mathrm { h e a d } _ { 1 } ^ { i } \| \cdots \| \mathrm { h e a d } _ { h } ^ { i } ) ,\\ \mathrm { h e a d } _ { k } ^ { i } = A t t e n t i o n ( \mathbf { W } _ { q } ^ { k } \tilde { \mathbf { H } } ^ { i } , \mathbf { W } _ { k } ^ { k } \tilde { \mathbf { H } } ^ { i } , \mathbf { W } _ { v } ^ { k } \tilde { \mathbf { H } } ^ { i } ) ,\\ A t t e n t i o n ( \tilde { \mathbf { H } } ^ { i } ) = \operatorname { s o f t m a x } \left( \frac { \tilde { \mathbf { H } } ^ { i } \tilde { \mathbf { H } } ^ { i T } } { \sqrt { d _ { k } } } \right) \tilde { \mathbf { H } } ^ { i } ,\\ F F N ( \tilde { \mathbf { H } } ^ { i } ) = \max \left( 0 , \mathbf { W } _ { 1 } \tilde { \mathbf { H } } ^ { i + 1 } + \mathbf { b } _ { 1 } \right) \mathbf { W } _ { 2 } + \mathbf { b } _ { 2 } ,\\

        ②Unfreezing the last U layers:

\mathbf{\bar{H}^{F+U-1}}=MHA\left(LN\left(\mathbf{H^{F+U-1}}\right)\right)+\mathbf{H^{F+U-1}},\\\mathbf{H^{F+U}}=FFN\left(LN\left(\mathbf{\bar{H}^{F+U-1}}\right)\right)+\mathbf{\bar{H}^{F+U-1}},

        ③The final regresion convolution (RConv):

\hat{\mathbf{Y}}_{S}=RCon\nu(\mathbf{H}^{F+U};\theta_{r})

        ④Loss function:

\mathcal{L}=\left\|\widehat{\mathbf{Y}}_{S}-\mathbf{Y}_{S}\right\|+\lambda\cdot L\mathrm{reg}

where \mathbf{Y}_{S} is ground truth

        ⑤Algorithm:

2.6. Experiments

2.6.1. Datasdets

        ①Statistics of datasets:

        ②NYCTaxi: includes 266 virtual stations and 4,368 timesteps (each timestep is half-hour)

        ③CHBike: includes 250 sites and 4,368 timesteps (30 mins as well)

2.6.2. Baselines

        ①GNN based baselines: DCRNN, STGCN, GWN, AGCRN, STGNCDE, DGCRN

        ②Attention based model: ASTGCN, GMAN, ASTGNN

        ③LLMs: OFA, GATGPT, GCNGPT, LLAMA2

2.6.3. Implementations

        ①Data split: 6:2:2

        ②Historical and future timesteps: P=12,S=12

        ③T_w=7,T_d=48

        ④Learning rate: 0.001 and Ranger21 optimizer for LLM and 0.001 and Adam for GCN and attention based

        ⑤LLM: GPT2 and LLAMA2 7B

        ⑥Layer: 6 for GPT2 and 8 for LLAMA2

        ⑦Epoch: 100

        ⑧Batch size: 64

2.6.4. Evaluation Metrics

        ①Metrics: Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE), and Weighted Absolute Percentage Error (WAPE)

2.6.5. Main Results

        ①Performance table:

2.6.6. Performance of ST-LLM and Ablation Studies

        ①Module ablation:

        ②Frozen ablation:

2.6.7. Parameter Analysis

        ①Hyperparameter U ablation:

2.6.8. Inference Time Analysis

        ①Inference time table:

2.6.9. Few-Shot Prediction

        ①10% samples few-shot learning:

2.6.10. Zero-Shot Prediction

        ①Performance:

2.7. Conclusion

        ~

3. Reference

@inproceedings{liu2024spatial,
  title={Spatial-Temporal Large Language Model for Traffic Prediction},
  author={Liu, Chenxi and Yang, Sun and Xu, Qianxiong and Li, Zhishuai and Long, Cheng and Li, Ziyue and Zhao, Rui},
  booktitle={MDM},
  year={2024}
}

相关文章:

  • Vite 和 Webpack 的区别和选择
  • 项目自荐:一个实用的免费批量文档翻译器
  • 【爬虫基础】第一部分 网络通讯-编程 P3/3
  • 快速熟悉商城源码的架构、业务逻辑和技术框架
  • 跟着AI学vue第八章
  • 基于SpringBoot的线上汽车租赁系统的设计与实现(源码+SQL脚本+LW+部署讲解等)
  • GStreamer源码安装1.24版本
  • pyside6学习专栏(三):自定义QLabel标签扩展类QLabelEx
  • 复制所绑定元素文本的vue自定义指令
  • 【论文解析】Fast prediction mode selection and CU partition for HEVC intra coding
  • flink-cdc同步数据到doris中
  • 算法的复杂性分析以及时间复杂度的表示方法
  • JavaSE学习笔记25-反射(reflection)
  • 顺序表和STL——vector【 复习笔记】
  • C++ IDE设置 visual studio 2010安装、注册、使用
  • 一周学会Flask3 Python Web开发-flask3模块化blueprint配置
  • 【Go语言快速上手】第二部分:Go语言进阶之工具与框架
  • L2-【英音】地道语音语调
  • 自由学习记录(37)
  • python学智能算法(二)|模拟退火算法:进阶分析
  • 母亲节书单|关于生育自由的未来
  • 印巴战火LIVE丨“快速接近战争状态”:印度袭击巴军事基地,巴启动反制军事行动
  • 人民财评:网售“婴儿高跟鞋”?不能让畸形审美侵蚀孩子身心
  • 新华时评:直播间里“家人”成“韭菜”,得好好管!
  • 玉渊谭天丨一艘航母看中国稀土出口管制为何有效
  • 上海启动万兆光网试点建设,助力“模速空间”跑出发展加速度