当前位置：首页 > news >正文

python：AI+ music21 构建LSTM模型生成爵士风格音乐

news 2025/10/13 22:56:58

这是一个使用 python的 music21 和 TensorFlow/Keras 构建 LSTM 模型生成爵士风格音乐的完整脚本。该脚本包含MIDI数据处理、模型训练和音乐生成全流程：

# -*- coding: utf-8 -*-
"""AI+ music21 和 TensorFlow/Keras 构建LSTM模型生成爵士风格音乐 """
import numpy as np
from music21 import converter, instrument, note, chord, stream
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.utils import to_categorical

# 参数配置
MIDI_FILE = "jazz_samples.mid"  # 爵士风格MIDI训练数据
SEQ_LENGTH = 50                 # 输入序列长度
BATCH_SIZE = 64                 # 训练批次大小
EPOCHS = 50                     # 训练轮数
GENERATE_LENGTH = 200           # 生成音符数量

# 1. MIDI数据预处理
def load_midi_data(file_path):
    """解析MIDI文件，提取音符和和弦"""
    notes = []
    midi = converter.parse(file_path)
    
    print(f"Parsing {file_path}...")
    parts = instrument.partitionByInstrument(midi)
    
    if parts:  # 处理多声部文件
        for part in parts.recurse():
            if isinstance(part, instrument.Instrument):
                for element in part.recurse():
                    if isinstance(element, note.Note):
                        notes.append(str(element.pitch))
                    elif isinstance(element, chord.Chord):
                        notes.append('.'.join(str(n) for n in element.normalOrder))
    else:      # 处理单声部文件
        for element in midi.flat.notes:
            if isinstance(element, note.Note):
                notes.append(str(element.pitch))
            elif isinstance(element, chord.Chord):
                notes.append('.'.join(str(n) for n in element.normalOrder))
    
    return notes

# 加载训练数据
notes = load_midi_data(MIDI_FILE)
unique_notes = sorted(set(notes))
note_to_int = dict((note, number) for number, note in enumerate(unique_notes))

# 创建训练序列
sequence_in = []
sequence_out = []
for i in range(len(notes) - SEQ_LENGTH):
    seq_in = notes[i:i + SEQ_LENGTH]
    seq_out = notes[i + SEQ_LENGTH]
    sequence_in.append([note_to_int[char] for char in seq_in])
    sequence_out.append(note_to_int[seq_out])

# 数据重塑
X = np.reshape(sequence_in, (len(sequence_in), SEQ_LENGTH, 1))
X = X / float(len(unique_notes))  # 归一化
y = to_categorical(sequence_out)

# 2. 构建LSTM模型
model = Sequential([
    LSTM(256, input_shape=(X.shape[1], X.shape[2])),
    Dropout(0.3),
    Dense(128, activation='relu'),
    Dense(y.shape[1], activation='softmax')
])

model.compile(loss='categorical_crossentropy', optimizer='adam')

# 3. 训练模型
print("Training model...")
model.fit(X, y, epochs=EPOCHS, batch_size=BATCH_SIZE)

# 4. 生成音乐
def generate_music(model, seed_sequence, length, temperature=0.7):
    """使用模型生成新音符序列"""
    int_to_note = dict((number, note) for number, note in enumerate(unique_notes))
    
    generated = []
    pattern = seed_sequence.copy()
    
    for _ in range(length):
        # 预测下一个音符
        prediction_input = np.reshape(pattern, (1, len(pattern), 1))
        prediction_input = prediction_input / float(len(unique_notes))
        
        prediction = model.predict(prediction_input, verbose=0)[0]
        
        # 使用温度采样增加随机性
        prediction = np.log(prediction) / temperature
        exp_preds = np.exp(prediction)
        probabilities = exp_preds / np.sum(exp_preds)
        index = np.random.choice(range(len(probabilities)), p=probabilities)
        
        result = int_to_note[index]
        generated.append(result)
        pattern.append(index)
        pattern = pattern[1:]  # 滑动窗口
        
    return generated

# 使用随机种子开始生成
start = np.random.randint(0, len(sequence_in)-1)
seed = sequence_in[start]
generated_notes = generate_music(model, seed, GENERATE_LENGTH)

# 5. 转换为MIDI文件
def create_midi(output_notes, filename="jazz_generated.mid"):
    """将生成的音符序列转换为MIDI文件"""
    stream_obj = stream.Stream()
    
    for pattern in output_notes:
        # 处理和弦
        if '.' in pattern:
            notes_in_chord = pattern.split('.')
            chord_notes = []
            for current_note in notes_in_chord:
                new_note = note.Note(int(current_note))
                new_note.storedInstrument = instrument.Piano()
                chord_notes.append(new_note)
            new_chord = chord.Chord(chord_notes)
            stream_obj.append(new_chord)
        # 处理单音符
        else:
            new_note = note.Note(pattern)
            new_note.storedInstrument = instrument.Saxophone()  # 爵士常用乐器
            stream_obj.append(new_note)
    
    stream_obj.write('midi', fp=filename)
    print(f"Generated MIDI saved as {filename}")

create_midi(generated_notes)

使用说明

准备训练数据：
- 需要至少1-2个爵士风格的MIDI文件（建议使用钢琴或萨克斯独奏）
- 推荐数据集：Jazz-Ml-Dataset Jazz-Ml-Dataset-master.zip
  解压到 D:/Music/Jazz-Ml-Dataset-master/Jazz_Midi/
环境依赖：
pip install tensorflow
pip install keras
pip install music21
关键功能：
- 自动解析MIDI中的和弦与音符
- 使用双层LSTM网络学习音乐模式
- 温度采样控制生成多样性
- 自动生成萨克斯和钢琴音色组合

改进方向：

# 更复杂的模型结构
model = Sequential([
    LSTM(512, return_sequences=True, input_shape=(X.shape[1], X.shape[2])),
    Dropout(0.3),
    LSTM(512),
    Dense(256, activation='relu'),
    Dense(y.shape[1], activation='softmax')
])

# 添加节奏信息处理
def load_data_with_duration(file_path):
    notes = []
    durations = []
    # 同时提取音符时值信息...

# 使用注意力机制
from tensorflow.keras.layers import Attention
model.add(Attention())

该脚本生成的音乐将具有爵士风格的以下特征：

扩展和弦（7th、9th和弦）
即兴旋律走向
摇摆节奏倾向
萨克斯与钢琴音色组合

建议使用Google Colab的GPU加速训练，对于复杂模型和大数据集可以获得更好的生成效果。

1. Sequential 模型的本质

Sequential 是 Keras 中的一种 线性堆叠模型，它允许你通过简单堆叠不同的神经网络层（Layer）来构建模型。你可以将它想象成 搭积木：

每一块积木代表一个网络层（如 LSTM、Dense 层等）
积木按顺序叠放，数据从第一层输入，依次传递到最后一层输出
这是构建简单神经网络的最高效方式。

2. 代码示例解析

假设你的代码片段如下：

model = Sequential([
    LSTM(256, input_shape=(X.shape[1], X.shape[2])),
    Dropout(0.3),
    Dense(128, activation='relu'),
    Dense(y.shape[1], activation='softmax')
])

各层的作用如下：

层类型	参数	作用
LSTM	`units=256`	核心层，处理时间序列数据。256 表示该层有 256 个记忆单元（可学习复杂模式）
	`input_shape=(seq_len, 1)`	输入数据的形状：`(时间步长, 特征数)`（例如，50 个音符序列 × 1 个特征）
Dropout	`rate=0.3`	随机丢弃 30% 的神经元，防止过拟合（类似让模型不要"死记硬背"训练数据）
Dense	`units=128` + ReLU	全连接层，将 LSTM 输出压缩到 128 维空间，ReLU 激活函数增加非线性
Dense	`units=输出维度` + Softmax	最后一层，输出每个可能音符的概率分布（例如，总共有 100 种音符，则输出 100 维概率向量）

3. 为什么用 LSTM？

(2) Dropout 层

(3) Dense 层

5. 模型可视化

假设输入序列长度为 50，模型结构如下：

复制

输入层 (50×1) → LSTM(256) → Dropout → Dense(128) → Dense(输出维度) → 音符概率

6. 模型如何生成音乐？

音乐是时间序列数据：音符的前后顺序包含重要信息（如旋律走向、和弦进行）
LSTM 的优势：通过门控机制（输入门、遗忘门、输出门）记忆长期依赖关系，适合生成连贯的音乐片段。

若想要更复杂的模型，可以堆叠多层 LSTM：

model = Sequential([
    LSTM(256, return_sequences=True, input_shape=(X.shape[1], X.shape[2])),
    LSTM(512),  # 第二层 LSTM
    Dropout(0.3),
    Dense(128, activation='relu'),
    Dense(y.shape[1], activation='softmax')
])

return_sequences=True 表示第一层 LSTM 返回完整序列（非仅最后输出），供下一层 LSTM 使用。
4. 关键参数详解

(1) LSTM 层
units=256：记忆单元数量，数值越大模型容量越高（但可能过拟合）
input_shape：必须指定输入数据的形状（模型第一层专用参数）
- 例如 (50, 1) 表示输入 50 个时间步（音符），每个时间步 1 个特征（音高编码）
rate=0.3：训练时随机关闭 30% 的神经元，强制模型学习冗余特征（提高泛化性）
activation='softmax'：将输出转化为概率分布（总和为 1），适合多分类任务（预测下一个音符）
输入：形状为 (batch_size, 50, 1) 的序列数据
输出：形状为 (batch_size, 输出维度) 的概率向量
训练阶段：模型学习输入序列（如 [C4, E4, G4]）到目标音符（如 C5）的映射关系。
生成阶段：
- 给定初始序列（如 [C4, E4]）
- 模型预测下一个音符的概率分布
- 根据概率采样得到新音符（如 G4）
- 将新音符加入序列，重复预测（类似自动续写）