音视频学习(六十一):H265中的VPS
H265码流结构
层级结构
+-------------------------------------------------+
| Byte Stream (比特流:Annex-B格式,含起始码 00 00 01) |
+-------------------------------------------------+│▼
+-----------------------------+
| NAL Units (网络抽象层单元) |
+-----------------------------+│ ││ └── VCL NALUs (视频编码层,包含图像数据)│└── Non-VCL NALUs (参数集/SEI/结束符等)
NALU 类型分类
VCL (Video Coding Layer)
- Slice NALU(包含编码后的视频图像数据)
Non-VCL (非视频层)
- VPS (Video Parameter Set,视频参数集)
- SPS (Sequence Parameter Set,序列参数集)
- PPS (Picture Parameter Set,图像参数集)
- SEI (Supplemental Enhancement Information,补充信息)
- AUD (Access Unit Delimiter,接入单元分隔符)
- EOS/EOB (End of Sequence/Bitstream)
结构层次
比特流 (Bitstream)└── NAL 单元 (NALU)├── VPS / SPS / PPS 参数集 (解码所需的上下文信息)├── SEI / AUD / EOS / EOB 辅助信息└── Slice (视频编码层)└── CTU (Coding Tree Unit, 编码树单元)├── CU (Coding Unit, 编码单元)│ ├── PU (Prediction Unit, 预测单元)│ └── TU (Transform Unit, 变换单元)└── 残差系数、预测模式、运动矢量等
层级解释:
- 比特流层 (Byte Stream Layer)
- 存储在文件或传输流中,通常使用 Annex-B 格式(起始码
0x000001
+ NALU)。
- 存储在文件或传输流中,通常使用 Annex-B 格式(起始码
- NAL 层 (Network Abstraction Layer)
- 负责封装视频数据单元,便于传输和存储。
- 每个 NALU 有一个 NAL 头(包含
nal_unit_type
,决定是 VPS/SPS/PPS/SLICE 等)。
- VCL 层 (Video Coding Layer)
- 具体的视频图像数据。
- 一个图像通常由若干个 Slice NALU 组成。
- 编码单元层 (CTU → CU → PU/TU)
- CTU (Coding Tree Unit):H.265 的基本编码块(取代 H.264 的宏块,最大 64×64)。
- CU (Coding Unit):CTU 可分割为多个 CU,每个 CU 决定预测模式。
- PU (Prediction Unit):预测单元,决定帧间/帧内预测方式。
- TU (Transform Unit):变换单元,进行 DCT/IDCT、量化操作。
什么是VPS?
简单来说,VPS 是 H.265 码流中最高层级的参数集,它提供了一个视频序列的全局概览。 它主要用于描述视频序列的复杂结构,特别是在涉及可伸缩性(Scalability) 和多视图视频(Multiview) 的场景中。
在 H.264 中,所有这些高层参数都塞进了序列参数集(SPS)中,这使得处理多层或多视图视频时变得很笨重。H.265 的设计者为了解决这个问题,将这些高层信息从 SPS 中分离出来,创建了专门的 VPS。
你可以把 VPS 看作是视频的“总纲”,它告诉解码器整个视频流有多少个层级(Layers)和子层级(Sub-layers),以及这些层级和子层级之间是如何组织和相互依赖的。
核心作用
VPS 的引入使得 H.265 能够高效地处理复杂的视频格式,其核心作用体现在以下几个方面:
- 支持可伸缩性:H.265 可以在同一个码流中编码多个质量、分辨率或帧率的视频版本。VPS 记录了这些不同版本(被称为“操作点”)的信息,例如每个操作点包含哪些层级,以及这些层级之间的依赖关系。这使得解码器可以根据网络带宽或设备性能,灵活地选择解码一个较低质量的子流,而不需要解码整个视频。
- 支持多视图编码:VPS 可以描述多视图视频的结构,例如一个包含多个摄像机视角的视频。它定义了每个视图的 ID 以及视图之间的依赖关系,这对于需要同步解码多个视图的场景至关重要。
- 简化参数管理:VPS 将与视频结构相关的全局参数集中管理,这让码流更加清晰和模块化。解码器在开始解码前,可以先解析 VPS,从而对整个视频流的结构有一个宏观的了解,这有助于提高解码的鲁棒性和效率。
VPS结构体
// H.265/HEVC Video Parameter Set (VPS)
typedef struct H265VPS {int vps_video_parameter_set_id; // VPS ID,范围 0~15int vps_base_layer_internal_flag; // 基层是否为内部编码层int vps_base_layer_available_flag; // 基层是否可用int vps_max_layers_minus1; // 支持的最大层数 - 1int vps_max_sub_layers_minus1; // 支持的最大子层数 - 1int vps_temporal_id_nesting_flag; // 时间层嵌套标志(是否所有层支持时间层 ID 嵌套)int vps_reserved_0xffff_16bits; // 保留字段,16 bits,全 1// profile_tier_level() 信息(与 SPS 中类似)int general_profile_idc; // 一般 profile 标识int general_level_idc; // 一般 level 标识int general_tier_flag; // 级别标识:0=Main tier, 1=High tier// sub-layersint sub_layer_profile_present_flag[7]; // 是否存在子层 profile 信息int sub_layer_level_present_flag[7]; // 是否存在子层 level 信息// timing infoint vps_sub_layer_ordering_info_present_flag;int vps_max_dec_pic_buffering_minus1[7]; // 最大解码图片缓存数 -1int vps_max_num_reorder_pics[7]; // 最大重排序帧数int vps_max_latency_increase_plus1[7]; // 最大解码延迟 +1// HRD parameters (可选)int vps_timing_info_present_flag; // 是否包含时序信息unsigned int vps_num_units_in_tick; // 时钟节拍分子unsigned int vps_time_scale; // 时钟节拍分母int vps_poc_proportional_to_timing_flag;unsigned int vps_num_ticks_poc_diff_one_minus1;// HRD parameters 可进一步嵌套int vps_extension_flag; // 是否存在扩展部分
} H265VPS;
基本标识
- vps_video_parameter_set_id (u(4))
VPS ID,取值范围 0~15。用于区分多个 VPS。 - vps_base_layer_internal_flag (u(1))
表示基础层是否由编码器内部生成。 - vps_base_layer_available_flag (u(1))
表示基础层是否对解码器可用。
层与子层配置
-
vps_max_layers_minus1 (u(6))
支持的最大层数 - 1。-
值为 0 → 只有一层(单层 HEVC)。
-
0 → 用于可伸缩 HEVC (SHVC) 或多视点 (MV-HEVC)。
-
-
vps_max_sub_layers_minus1 (u(3))
支持的最大时间子层数 - 1(范围 0~6)。-
0 → 只支持子层 0。
-
0 → 支持多层级时间结构(如逐层递进的帧率)。
-
-
vps_temporal_id_nesting_flag (u(1))
时间 ID 嵌套标志:- 1 → 所有层都支持 temporal nesting。
- 0 → 不强制。
-
vps_reserved_0xffff_16bits (u(16))
保留字段,必须为0xFFFF
。
profile_tier_level() 信息
profile / tier / level 定义了解码器能力要求。
- general_profile_space / general_profile_idc
Profile 标识,如 Main (1)、Main 10 (2)、Main Still Picture (3)。 - general_tier_flag
Tier 级别:- 0 = Main tier(主等级,普通带宽要求);
- 1 = High tier(高等级,带宽/解码能力更高)。
- general_level_idc
Level,定义了解码性能需求(如 Level 5.1, 6.2)。 - sub_layer_profile_present_flag[i]
指示是否为子层提供 profile 信息。 - sub_layer_level_present_flag[i]
指示是否为子层提供 level 信息。
解码顺序控制(Sub-layer ordering)
- vps_sub_layer_ordering_info_present_flag
是否为每个子层单独指定参数。- 0 → 所有子层共享同一组参数(取决于最高层)。
- 1 → 每层独立定义。
- vps_max_dec_pic_buffering_minus1[i]
最大解码图片缓存数 - 1。
例如:值 4 → 需要缓存 5 张帧。 - vps_max_num_reorder_pics[i]
最大重排序帧数(用于 B 帧和解码输出延迟控制)。 - vps_max_latency_increase_plus1[i]
最大解码延迟增加 +1。用于限制解码器输出时序。
时序与HRD参数
- vps_timing_info_present_flag
是否存在时序信息。 - vps_num_units_in_tick / vps_time_scale
定义时钟频率。- 帧率 ≈
time_scale / num_units_in_tick
。
- 帧率 ≈
- vps_poc_proportional_to_timing_flag
POC(显示顺序计数)是否与时间成比例。 - vps_num_ticks_poc_diff_one_minus1
当上面标志为 1 时,定义 POC 与时间的比例。 - hrd_parameters() (可选)
HRD (Hypothetical Reference Decoder) 参数,定义码率控制、缓冲需求。
扩展与兼容性
- vps_extension_flag
是否存在扩展字段。- =1 时,后面跟随 vps_extension_data_flag[] 等信息(用于 SHVC/MV-HEVC)。
- vps_extension_data_flag[]
保留扩展位,用于将来标准扩展。
解析实例
示例(c++)
// hevc_vps_parser.cpp
#include <cstdint>
#include <vector>
#include <stdexcept>
#include <iostream>
#include <iomanip>
#include <string>
#include <array>
#include <sstream>// ----------------- BitReader (removes emulation prevention bytes) -----------------
class BitReader {
public:BitReader(const std::vector<uint8_t>& nal_rbsp) {buffer = removeEmulationPrevention(nal_rbsp);bit_pos = 0;}// read single bituint32_t readBit() {if (bit_pos >= buffer.size() * 8) throw std::out_of_range("readBit out of range");size_t byte_idx = bit_pos / 8;int bit_off = 7 - (bit_pos % 8);uint8_t b = (buffer[byte_idx] >> bit_off) & 0x01u;++bit_pos;return b;}// read n bits (n <= 32)uint32_t readBits(int n) {if (n == 0) return 0;if (n < 0 || n > 32) throw std::invalid_argument("readBits n invalid");uint32_t v = 0;for (int i = 0; i < n; ++i) {v = (v << 1) | readBit();}return v;}// read unsigned Exp-Golomb UE(v)uint32_t readUE() {// count leading zerosint leadingZeroBits = 0;while (true) {if (bit_pos >= buffer.size() * 8) throw std::out_of_range("readUE out of range");if (readBit() == 0) {++leadingZeroBits;continue;} else {break;}}if (leadingZeroBits == 0) return 0;uint32_t suffix = readBits(leadingZeroBits);uint32_t codeNum = (1u << leadingZeroBits) - 1 + suffix;return codeNum;}// read signed Exp-Golomb SE(v)int32_t readSE() {uint32_t codeNum = readUE();int32_t val = (codeNum & 1) ? (int32_t)((codeNum + 1) / 2) : -(int32_t)(codeNum / 2);return val;}// bits leftsize_t bitsLeft() const {return buffer.size() * 8 - bit_pos;}// helper: skip n bitsvoid skipBits(size_t n) {if (bit_pos + n > buffer.size() * 8) throw std::out_of_range("skipBits out of range");bit_pos += n;}private:std::vector<uint8_t> buffer;size_t bit_pos;static std::vector<uint8_t> removeEmulationPrevention(const std::vector<uint8_t>& src) {std::vector<uint8_t> dst;dst.reserve(src.size());for (size_t i = 0; i < src.size(); ++i) {if (i + 2 < src.size() && src[i] == 0x00 && src[i+1] == 0x00 && src[i+2] == 0x03) {// copy 0x00, 0x00, skip 0x03dst.push_back(src[i]);dst.push_back(src[i+1]);i += 2; // next iteration will do i++continue;}dst.push_back(src[i]);}return dst;}
};// ----------------- VPS data structure -----------------
struct H265VPS {// identificationuint8_t vps_video_parameter_set_id = 0; // 4 bitsbool vps_base_layer_internal_flag = false;bool vps_base_layer_available_flag = false;uint8_t vps_max_layers_minus1 = 0; // 6 bitsuint8_t vps_max_sub_layers_minus1 = 0; // 3 bitsbool vps_temporal_id_nesting_flag = false;uint16_t vps_reserved_0xffff_16bits = 0;// profile_tier_level (partial/general)// general fields (we parse main ones)uint8_t general_profile_space = 0; // 2 bitsbool general_tier_flag = false;uint8_t general_profile_idc = 0; // 5 bitsuint32_t general_profile_compatibility_flags = 0;uint64_t general_constraint_flags = 0; // we pack constraint flags into 64-bit for convenienceuint8_t general_level_idc = 0;// sub_layer presence flags (max 7)std::array<bool, 7> sub_layer_profile_present_flag{};std::array<bool, 7> sub_layer_level_present_flag{};// sub-layer ordering infobool vps_sub_layer_ordering_info_present_flag = false;// arrays sized by (vps_max_sub_layers_minus1 + 1)std::vector<uint32_t> vps_max_dec_pic_buffering_minus1;std::vector<uint32_t> vps_max_num_reorder_pics;std::vector<uint32_t> vps_max_latency_increase_plus1;// timing infobool vps_timing_info_present_flag = false;uint32_t vps_num_units_in_tick = 0;uint32_t vps_time_scale = 0;bool vps_poc_proportional_to_timing_flag = false;uint32_t vps_num_ticks_poc_diff_one_minus1 = 0;// extensionbool vps_extension_flag = false;// raw RBSP for debugging/inspectionstd::vector<uint8_t> raw_rbsp;
};// ----------------- parse profile_tier_level() (partial) -----------------
// We parse the general_profile_* fields and sub-layer present flags & sub-layer profile/level fields
// for simplicity we only record presence flags and general values.
static void parse_profile_tier_level(BitReader& br, H265VPS& vps) {// general_profile_space (2)vps.general_profile_space = (uint8_t)br.readBits(2);vps.general_tier_flag = br.readBit();vps.general_profile_idc = (uint8_t)br.readBits(5);// general_profile_compatibility_flags (32 bits)vps.general_profile_compatibility_flags = br.readBits(32);// general_constraint_flags (we'll read 48 bits as in standard: general_constraint_indicator_flags (48))// read high 16 bits first (to make up 64 bits container)uint64_t hi = br.readBits(16);uint64_t lo = br.readBits(32);uint64_t lo2 = br.readBits(0); // no-op placeholder (keeps pattern)// The standard actually uses 48 bits; we already read 16+32 = 48 bits. pack them:vps.general_constraint_flags = (hi << 32) | lo;// general_level_idc (8)vps.general_level_idc = (uint8_t)br.readBits(8);// Note: standard then has for (i=0;i< vps_max_sub_layers_minus1; i++) sub_layer_profile_present_flag/sub_layer_level_present_flag// but the caller must have set vps_max_sub_layers_minus1 already
}// ----------------- Main parseVPS function -----------------
H265VPS parseVPS(const std::vector<uint8_t>& nal_rbsp_payload) {BitReader br(nal_rbsp_payload);H265VPS vps;vps.raw_rbsp = nal_rbsp_payload;// vps_video_parameter_set_id: u(4)vps.vps_video_parameter_set_id = (uint8_t)br.readBits(4);vps.vps_base_layer_internal_flag = br.readBit();vps.vps_base_layer_available_flag = br.readBit();vps.vps_max_layers_minus1 = (uint8_t)br.readBits(6);vps.vps_max_sub_layers_minus1 = (uint8_t)br.readBits(3);vps.vps_temporal_id_nesting_flag = br.readBit();vps.vps_reserved_0xffff_16bits = (uint16_t)br.readBits(16); // should be 0xFFFF// parse profile_tier_level( vps_max_sub_layers_minus1 )parse_profile_tier_level(br, vps);// sub_layer_profile_present_flag[ i ] and sub_layer_level_present_flag[ i ] for i in 0..max_sub_layers-1for (int i = 0; i < (int)vps.vps_max_sub_layers_minus1; ++i) {vps.sub_layer_profile_present_flag[i] = br.readBit();vps.sub_layer_level_present_flag[i] = br.readBit();}// if max_sub_layers_minus1 > 0, there are some reserved bits to align to bytes (profile_tier_level handling),if (vps.vps_max_sub_layers_minus1 > 0) {// per standard we must skip some bits if sublayer flags absent, but common streams set flags to 0.// For safety, attempt to handle sub-layer fields (limited)for (int i = 0; i < (int)vps.vps_max_sub_layers_minus1; ++i) {if (vps.sub_layer_profile_present_flag[i]) {// profile fields for sub-layer (we'll skip them; if you need them, implement full parse)// For now try to skip fixed-size approximations:br.skipBits(2); // sub_layer_profile_spacebr.skipBits(1); // sub_layer_tier_flagbr.skipBits(5); // sub_layer_profile_idcbr.skipBits(32); // compatibility flags (approx)br.skipBits(48); // constraint flags approx}if (vps.sub_layer_level_present_flag[i]) {br.skipBits(8); // sub_layer_level_idc}}}// vps_sub_layer_ordering_info_present_flagvps.vps_sub_layer_ordering_info_present_flag = br.readBit();int start = vps.vps_sub_layer_ordering_info_present_flag ? 0 : vps.vps_max_sub_layers_minus1;int end = vps.vps_max_sub_layers_minus1;vps.vps_max_dec_pic_buffering_minus1.resize(end - start + 1);vps.vps_max_num_reorder_pics.resize(end - start + 1);vps.vps_max_latency_increase_plus1.resize(end - start + 1);for (int i = start; i <= end; ++i) {uint32_t idx = i - start;vps.vps_max_dec_pic_buffering_minus1[idx] = br.readUE();vps.vps_max_num_reorder_pics[idx] = br.readUE();vps.vps_max_latency_increase_plus1[idx] = br.readUE();}// vps_max_layer_id (u(6)) and vps_num_layer_sets_minus1 (ue(v))// these are used for layer sets (scalability) — parse basic valuesuint32_t vps_max_layer_id = br.readBits(6);(void)vps_max_layer_id;uint32_t vps_num_layer_sets_minus1 = br.readUE();(void)vps_num_layer_sets_minus1;// For each layer_set, there are flags; this can be lengthy — we'll skip detailed layer_set flags// Skip the layer_set_flags entries roughly (safe approach: skip variable bits if possible)for (uint32_t i = 1; i <= vps_num_layer_sets_minus1; ++i) {// For each layer id from 0..vps_max_layer_id, there is an entry, but actual count depends on implementation.// We conservatively skip a small number of bits if available, but do not fail if insufficient bits.// Proper implementation should parse exactly as standard.// Here we attempt to skip until next known field, but to be safe we just break if bitsLeft low.if (br.bitsLeft() < 8) break;// try to skip a single bit per possible layer (best-effort)size_t toSkip = std::min<size_t>(br.bitsLeft(), (size_t)(vps_max_layer_id + 1));br.skipBits(toSkip);}// timing infovps.vps_timing_info_present_flag = br.readBit();if (vps.vps_timing_info_present_flag) {// num_units_in_tick, time_scale are u(32)vps.vps_num_units_in_tick = br.readBits(32);vps.vps_time_scale = br.readBits(32);vps.vps_poc_proportional_to_timing_flag = br.readBit();if (vps.vps_poc_proportional_to_timing_flag) {vps.vps_num_ticks_poc_diff_one_minus1 = br.readUE();}// TODO: general_hrd_parameters_or_ols_hrd_parameters() parsing if present// Typically you'd parse hrd_parameters() here which is complex.bool vps_poc_hrd_present_flag = br.readBit(); // sometimes present in streams; best-effort read(void)vps_poc_hrd_present_flag;// Skipping HRD parsing for brevity}// finally extension flagvps.vps_extension_flag = br.readBit();// If extension_flag is set, there are vps_extension_data_flag[] bits until rbsp_trailing_bits — we skip themif (vps.vps_extension_flag) {// read remaining bits as extension_data_flag until rbsp trailing - best-effortwhile (br.bitsLeft() > 8) { // leave space for rbsp_trailing_bits (at least 1 bit '1' + 7 zero)// vps_extension_data_flag[i] is 1 bit each(void)br.readBit();}}// done (we don't enforce rbsp_trailing_bits alignment here)return vps;
}// ----------------- Example DecoderContext and apply function -----------------
struct DecoderContext {uint8_t maxSubLayers = 0;bool temporalIdNesting = false;uint32_t numUnitsInTick = 0;uint32_t timeScale = 0;// more fields as needed by your decoder...
};void applyVPSToDecoder(const H265VPS& vps, DecoderContext& ctx) {ctx.maxSubLayers = vps.vps_max_sub_layers_minus1 + 1;ctx.temporalIdNesting = vps.vps_temporal_id_nesting_flag;if (vps.vps_timing_info_present_flag) {ctx.numUnitsInTick = vps.vps_num_units_in_tick;ctx.timeScale = vps.vps_time_scale;}// Additional mapping: concurrency, HRD limits -> buffer sizes, etc.
}// ----------------- Utility: print VPS -----------------
std::string bools(bool v) { return v ? "1" : "0"; }
void printVPS(const H265VPS& vps) {std::ostringstream os;os << "VPS parsed:\n";os << " vps_video_parameter_set_id: " << (int)vps.vps_video_parameter_set_id << "\n";os << " vps_base_layer_internal_flag: " << bools(vps.vps_base_layer_internal_flag) << "\n";os << " vps_base_layer_available_flag: " << bools(vps.vps_base_layer_available_flag) << "\n";os << " vps_max_layers_minus1: " << (int)vps.vps_max_layers_minus1 << "\n";os << " vps_max_sub_layers_minus1: " << (int)vps.vps_max_sub_layers_minus1 << "\n";os << " vps_temporal_id_nesting_flag: " << bools(vps.vps_temporal_id_nesting_flag) << "\n";os << " vps_reserved_0xffff_16bits: 0x" << std::hex << vps.vps_reserved_0xffff_16bits << std::dec << "\n";os << " general_profile_idc: " << (int)vps.general_profile_idc << " general_level_idc: " << (int)vps.general_level_idc<< " general_tier_flag: " << bools(vps.general_tier_flag) << "\n";os << " vps_sub_layer_ordering_info_present_flag: " << bools(vps.vps_sub_layer_ordering_info_present_flag) << "\n";for (size_t i = 0; i < vps.vps_max_dec_pic_buffering_minus1.size(); ++i) {os << " sub_layer[" << i << "] max_dec_pic_buffering_minus1=" << vps.vps_max_dec_pic_buffering_minus1[i]<< " max_num_reorder_pics=" << vps.vps_max_num_reorder_pics[i]<< " max_latency_increase_plus1=" << vps.vps_max_latency_increase_plus1[i] << "\n";}os << " vps_timing_info_present_flag: " << bools(vps.vps_timing_info_present_flag) << "\n";if (vps.vps_timing_info_present_flag) {os << " num_units_in_tick: " << vps.vps_num_units_in_tick << " time_scale: " << vps.vps_time_scale << "\n";os << " poc_proportional_to_timing_flag: " << bools(vps.vps_poc_proportional_to_timing_flag)<< " num_ticks_poc_diff_one_minus1: " << vps.vps_num_ticks_poc_diff_one_minus1 << "\n";}os << " vps_extension_flag: " << bools(vps.vps_extension_flag) << "\n";std::cout << os.str();
}// ----------------- Example usage -----------------
int main() {// Example: 非真实的 VPS RBSP bytes(仅示例,真实流请填入 NAL payload(去掉 nal header))std::vector<uint8_t> exampleVPS = {// This is placeholder sample data and will likely not represent a valid VPS from encoder.// For real testing, extract VPS NAL unit payload (excluded nal header) from a real HEVC stream.0x40, 0x01, 0x0c, 0x01, 0xff, 0xff, 0x01, 0x60, 0x00, 0x00, 0x03, 0x00, 0x90, 0x00};try {H265VPS vps = parseVPS(exampleVPS);printVPS(vps);DecoderContext ctx;applyVPSToDecoder(vps, ctx);std::cout << "DecoderContext: maxSubLayers=" << (int)ctx.maxSubLayers<< " temporalNesting=" << bools(ctx.temporalIdNesting)<< " numUnitsInTick=" << ctx.numUnitsInTick<< " timeScale=" << ctx.timeScale << "\n";} catch (const std::exception& e) {std::cerr << "parseVPS failed: " << e.what() << "\n";}return 0;
}