当前位置：首页 > news >正文

《Python星球日记》第81天：回看图像生成与风格迁移

news 2025/8/25 16:35:22

名人说：路漫漫其修远兮，吾将上下而求索。—— 屈原《离骚》
创作者：Code_流苏(CSDN)（一个喜欢古诗词和编程的Coder😊）

目录

一、图像生成技术概述
1. GAN在图像生成中的应用
GAN的主要变种

2. 使用预训练GAN生成图像
GAN的实际应用场景

二、神经风格迁移
1. Neural Style Transfer的原理
数学原理简析

2. 使用TensorFlow Hub实现风格迁移
3. 风格迁移的进阶应用

三、风格迁移与图像生成的比较
四、实践项目：创建个性化艺术图像生成器
五、总结与展望
未来发展趋势

六、进一步学习资源

👋 专栏介绍： Python星球日记专栏介绍（持续更新ing）
✅ 上一篇：《Python星球日记》第80天：目标检测（YOLO、Mask R-CNN）

欢迎回到Python星球🪐日记！今天是我们旅程的第81天。

今天我们将回温一下之前了解过的计算机视觉中两个极其有趣且创造性的领域：图像生成和风格迁移。这些技术让我们能够创建全新的图像，或者将一张图片的艺术风格应用到另一张图片上，展现了人工智能在创意领域的强大潜力。

让我们一起来揭开这个充满艺术与技术结合的奇妙世界！

一、图像生成技术概述

图像生成是深度学习中一个令人兴奋的应用领域，它让机器能够创造出前所未有的视觉内容。与传统的图像处理不同，图像生成不仅仅是对已有图像进行编辑或变换，而是从零开始或根据特定条件创造全新的图像。

在这里插入图片描述

1. GAN在图像生成中的应用

生成对抗网络（Generative Adversarial Networks，简称GAN）是目前最流行的图像生成方法之一。GAN由Ian Goodfellow在2014年提出，它的工作原理类似于"伪造者"与"鉴定者"之间的博弈。

GAN由两个核心组件组成：

生成器（Generator）：负责生成看起来逼真的图像
判别器（Discriminator）：负责区分真实图像和生成的图像

在这里插入图片描述

GAN的训练过程是一个零和游戏，生成器不断改进自己以产生更逼真的图像，而判别器则努力提高自己识别假图像的能力。通过这种对抗性训练，二者不断进步，最终生成器能够生成高质量的逼真图像。

GAN的主要变种

在基础GAN之上，研究人员开发了许多变种以解决不同的问题：

DCGAN（Deep Convolutional GAN）：使用卷积神经网络作为生成器和判别器
WGAN（Wasserstein GAN）：改进了训练稳定性和解决了模式崩溃问题
StyleGAN：能够控制生成图像的不同风格特征
CycleGAN：实现了不同域之间的图像转换
pix2pix：用于有监督的图像到图像转换

2. 使用预训练GAN生成图像

现在，我们来实际体验一下如何使用预训练的GAN模型生成图像。我们将使用StyleGAN2，这是由NVIDIA研究院开发的GAN架构，它能够生成极其逼真的人脸图像。

# 导入必要的库
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import tensorflow_hub as hub# 下载预训练的StyleGAN2模型
model_url = "https://tfhub.dev/google/progan-128/1"
generator = hub.load(model_url)# 生成随机潜在向量
def generate_latent_vectors(batch_size, latent_dim=512):return tf.random.normal([batch_size, latent_dim])# 使用生成器创建图像
def generate_images(batch_size=4):latent_vectors = generate_latent_vectors(batch_size)# 生成图像generated_images = generator(latent_vectors)# 将像素值范围从[-1, 1]转换为[0, 1]generated_images = (generated_images + 1) / 2.0return generated_images# 生成并显示4张图像
images = generate_images(4)# 显示生成的图像
plt.figure(figsize=(10, 10))
for i in range(4):plt.subplot(2, 2, i+1)plt.imshow(images[i])plt.axis('off')
plt.tight_layout()
plt.show()

通过这段代码，我们可以使用预训练的StyleGAN2模型生成逼真的人脸图像。值得注意的是，这些人脸是完全虚构的，不对应任何真实存在的人。

在这里插入图片描述

GAN的实际应用场景

GAN的应用远不止于生成人脸，它在多个领域都有广泛的应用：

艺术创作：创造新的艺术风格或模仿特定艺术家的风格
游戏开发：自动生成游戏场景、角色和纹理
医学影像：生成合成医学图像用于训练和研究
数据增强：为训练其他模型生成更多样本
时尚设计：创造新的服装设计或纹理
动画和特效：生成逼真的动画角色或特殊效果

二、神经风格迁移

1. Neural Style Transfer的原理

神经风格迁移（Neural Style Transfer，简称NST）是一种能够将一张图片的艺术风格应用到另一张图片上的技术。它由Gatys等人在2015年提出，是深度学习在创意领域的一个代表性应用。

在这里插入图片描述

风格迁移的核心思想是：

从内容图像中提取内容特征
从风格图像中提取风格特征
生成一张新图像，使其同时匹配内容图像的内容特征和风格图像的风格特征

在这里插入图片描述

数学原理简析

神经风格迁移使用预训练的卷积神经网络(如VGG19)来提取图像的特征表示。它的核心在于两个关键的损失函数：

内容损失（Content Loss）：确保生成图像与内容图像在高层特征上相似

# 内容损失计算
def content_loss(content_features, generated_features):return tf.reduce_mean(tf.square(content_features - generated_features))

风格损失（Style Loss）：确保生成图像与风格图像在纹理和颜色分布上相似

# 计算格拉姆矩阵(Gram Matrix)
def gram_matrix(input_tensor):# 重新调整输入张量的维度channels = int(input_tensor.shape[-1])a = tf.reshape(input_tensor, [-1, channels])# 计算矩阵乘积n = tf.shape(a)[0]gram = tf.matmul(a, a, transpose_a=True)# 归一化return gram / tf.cast(n, tf.float32)# 风格损失计算
def style_loss(style_features, generated_features):gram_style = gram_matrix(style_features)gram_generated = gram_matrix(generated_features)return tf.reduce_mean(tf.square(gram_style - gram_generated))

神经风格迁移的总损失函数是内容损失和风格损失的加权和：

total_loss = content_weight * content_loss + style_weight * style_loss

通过优化这个损失函数，我们可以生成既保留内容图像主要内容又采用风格图像艺术风格的新图像。

2. 使用TensorFlow Hub实现风格迁移

现在，我们来实现一个简单的风格迁移应用。TensorFlow Hub提供了预训练的风格迁移模型，让我们能够轻松实现这一效果。

import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import PIL.Image
import matplotlib.pyplot as plt
import time# 加载图像
def load_image(img_path, max_size=512):# 读取并调整图像大小img = tf.io.read_file(img_path)img = tf.image.decode_image(img, channels=3)img = tf.image.convert_image_dtype(img, tf.float32)# 保持纵横比的同时调整大小shape = tf.cast(tf.shape(img)[:-1], tf.float32)long_dim = max(shape)scale = max_size / long_dimnew_shape = tf.cast(shape * scale, tf.int32)img = tf.image.resize(img, new_shape)img = img[tf.newaxis, :]return img# 显示图像
def show_image(image, title=None):if len(image.shape) > 3:image = tf.squeeze(image, axis=0)plt.figure(figsize=(10, 10))plt.imshow(image)if title:plt.title(title)plt.axis('off')plt.show()# 加载内容图像和风格图像
content_path = 'content_image.jpg'  # 替换为你的内容图像路径
style_path = 'style_image.jpg'      # 替换为你的风格图像路径content_image = load_image(content_path)
style_image = load_image(style_path)# 加载TensorFlow Hub模型
model = hub.load('https://tfhub.dev/google/magenta/arbitrary-image-stylization-v1-256/2')# 进行风格迁移
start_time = time.time()
stylized_image = model(tf.constant(content_image), tf.constant(style_image))[0]
end_time = time.time()print(f"风格迁移耗时: {end_time - start_time:.2f}秒")# 显示结果
show_image(content_image, '内容图像')
show_image(style_image, '风格图像')
show_image(stylized_image, '风格迁移结果')# 保存结果
output_image = tf.squeeze(stylized_image).numpy()
output_image = (output_image * 255).astype(np.uint8)
PIL.Image.fromarray(output_image).save('stylized_image.jpg')

在这里插入图片描述

3. 风格迁移的进阶应用

风格迁移不仅限于静态图像，它还有许多有趣的应用和变种：

视频风格迁移：将风格迁移应用于视频的每一帧
实时风格迁移：使用轻量级模型实现实时风格迁移
多风格迁移：结合多种艺术风格创造新的混合风格
内容感知风格迁移：根据图像内容的不同部分应用不同的风格

以下是一个示例，展示如何使用TensorFlow实现视频风格迁移的关键步骤：

import cv2
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub# 加载预训练的风格迁移模型
model = hub.load('https://tfhub.dev/google/magenta/arbitrary-image-stylization-v1-256/2')# 加载风格图像
style_image_path = 'style_image.jpg'
style_image = tf.io.read_file(style_image_path)
style_image = tf.image.decode_image(style_image, channels=3)
style_image = tf.image.convert_image_dtype(style_image, tf.float32)
style_image = tf.image.resize(style_image, [256, 256])
style_image = style_image[tf.newaxis, :]# 打开视频
cap = cv2.VideoCapture('input_video.mp4')
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))# 创建输出视频
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('stylized_video.mp4', fourcc, fps, (width, height))while cap.isOpened():ret, frame = cap.read()if not ret:break# 处理帧frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)frame = tf.convert_to_tensor(frame, dtype=tf.float32) / 255.0frame = frame[tf.newaxis, :]# 应用风格迁移stylized_frame = model(frame, style_image)[0]# 转换回OpenCV格式stylized_frame = tf.squeeze(stylized_frame)stylized_frame = tf.clip_by_value(stylized_frame, 0, 1) * 255stylized_frame = tf.cast(stylized_frame, tf.uint8)stylized_frame = stylized_frame.numpy()stylized_frame = cv2.cvtColor(stylized_frame, cv2.COLOR_RGB2BGR)# 写入输出视频out.write(stylized_frame)# 可选：显示处理中的帧cv2.imshow('Stylized Video', stylized_frame)if cv2.waitKey(1) & 0xFF == ord('q'):break# 释放资源
cap.release()
out.release()
cv2.destroyAllWindows()

三、风格迁移与图像生成的比较

为了更好地理解这两种技术的异同，我们可以对比它们的关键特性：

特性	图像生成 (GAN)	风格迁移 (NST)
目标	从随机噪声或条件创建全新图像	将一张图片的风格应用到另一张图片
输入	随机噪声向量或条件输入	内容图像和风格图像
训练方式	生成器与判别器对抗训练	优化生成图像以匹配内容和风格特征
应用场景	创造不存在的图像、数据增强	艺术创作、内容重新设计
计算复杂度	训练复杂，推理快速	传统方法优化慢，预训练模型推理快
控制性	在潜在空间中难以精确控制	可以明确控制内容和风格来源

四、实践项目：创建个性化艺术图像生成器

现在，让我们结合所学知识，创建一个完整的项目，实现个性化的艺术图像生成器。我们将使用TensorFlow Hub的预训练风格迁移模型，并添加一些额外功能。

import tensorflow as tf
import tensorflow_hub as hub
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from pathlib import Path
import time
import tkinter as tk
from tkinter import filedialog, ttk
from matplotlib.backends.backend_tkagg import FigureCanvasTkAggclass ArtisticImageGenerator:def __init__(self):# 加载模型print("正在加载模型，请稍候...")self.model = hub.load('https://tfhub.dev/google/magenta/arbitrary-image-stylization-v1-256/2')print("模型加载完成！")# 准备样式集合self.style_images = {}self.load_default_styles()def load_default_styles(self):"""加载默认风格图像"""styles_dir = Path("styles")if not styles_dir.exists():styles_dir.mkdir()print(f"已创建{styles_dir}目录，请添加风格图像")else:# 加载目录中的所有图像作为风格for img_path in styles_dir.glob("*.jpg"):style_name = img_path.stemstyle_image = self.load_image(str(img_path))self.style_images[style_name] = style_imageprint(f"已加载风格: {style_name}")def load_image(self, img_path, max_size=512):"""加载并预处理图像"""img = tf.io.read_file(img_path)img = tf.image.decode_image(img, channels=3)img = tf.image.convert_image_dtype(img, tf.float32)# 保持纵横比的同时调整大小shape = tf.cast(tf.shape(img)[:-1], tf.float32)long_dim = max(shape)scale = max_size / long_dimnew_shape = tf.cast(shape * scale, tf.int32)img = tf.image.resize(img, new_shape)img = img[tf.newaxis, :]return imgdef generate_stylized_image(self, content_image, style_image):"""生成风格化图像"""start_time = time.time()stylized_image = self.model(content_image, style_image)[0]end_time = time.time()print(f"风格迁移耗时: {end_time - start_time:.2f}秒")return stylized_imagedef create_gui(self):"""创建图形用户界面"""self.root = tk.Tk()self.root.title("Python星球日记 - 艺术图像生成器")self.root.geometry("1200x800")# 内容区域content_frame = ttk.LabelFrame(self.root, text="内容图像")content_frame.grid(row=0, column=0, padx=10, pady=10, sticky="nsew")self.content_btn = ttk.Button(content_frame, text="选择内容图像", command=self.select_content_image)self.content_btn.pack(pady=10)# 风格区域style_frame = ttk.LabelFrame(self.root, text="风格图像")style_frame.grid(row=0, column=1, padx=10, pady=10, sticky="nsew")self.style_btn = ttk.Button(style_frame, text="选择风格图像", command=self.select_style_image)self.style_btn.pack(pady=10)# 预设风格选择if self.style_images:style_select_frame = ttk.Frame(style_frame)style_select_frame.pack(pady=10, fill="x")ttk.Label(style_select_frame, text="预设风格:").pack(side="left")self.style_var = tk.StringVar()style_combo = ttk.Combobox(style_select_frame, textvariable=self.style_var,values=list(self.style_images.keys()))style_combo.pack(side="left", padx=5)style_combo.bind("<<ComboboxSelected>>", self.select_preset_style)# 生成按钮generate_btn = ttk.Button(self.root, text="生成风格化图像", command=self.generate_and_display)generate_btn.grid(row=1, column=0, columnspan=2, pady=10)# 结果区域result_frame = ttk.LabelFrame(self.root, text="生成结果")result_frame.grid(row=2, column=0, columnspan=2, padx=10, pady=10, sticky="nsew")# 调整网格权重self.root.grid_rowconfigure(2, weight=1)self.root.grid_columnconfigure(0, weight=1)self.root.grid_columnconfigure(1, weight=1)# 初始化变量self.content_image = Noneself.style_image = Noneself.fig = plt.figure(figsize=(10, 6))self.canvas = FigureCanvasTkAgg(self.fig, master=result_frame)self.canvas.get_tk_widget().pack(fill=tk.BOTH, expand=True)# 显示欢迎信息self.display_welcome()self.root.mainloop()def display_welcome(self):"""显示欢迎信息"""self.fig.clear()ax = self.fig.add_subplot(111)ax.text(0.5, 0.5, "欢迎使用艺术图像生成器！\n\n请选择内容图像和风格图像开始创作。", ha='center', va='center', fontsize=16)ax.axis('off')self.canvas.draw()def select_content_image(self):"""选择内容图像"""file_path = filedialog.askopenfilename(title="选择内容图像",filetypes=[("图像文件", "*.jpg *.jpeg *.png")])if file_path:self.content_image = self.load_image(file_path)self.content_btn.config(text=f"内容图像: {Path(file_path).name}")self.display_preview()def select_style_image(self):"""选择风格图像"""file_path = filedialog.askopenfilename(title="选择风格图像",filetypes=[("图像文件", "*.jpg *.jpeg *.png")])if file_path:self.style_image = self.load_image(file_path)self.style_btn.config(text=f"风格图像: {Path(file_path).name}")if hasattr(self, 'style_var'):self.style_var.set("")  # 清除预设选择self.display_preview()def select_preset_style(self, event=None):"""选择预设风格"""style_name = self.style_var.get()if style_name in self.style_images:self.style_image = self.style_images[style_name]self.style_btn.config(text=f"风格图像: {style_name}")self.display_preview()def display_preview(self):"""显示预览"""self.fig.clear()if self.content_image is not None:ax1 = self.fig.add_subplot(1, 2, 1)content_np = tf.squeeze(self.content_image).numpy()ax1.imshow(content_np)ax1.set_title("内容图像")ax1.axis('off')if self.style_image is not None:ax2 = self.fig.add_subplot(1, 2, 2)style_np = tf.squeeze(self.style_image).numpy()ax2.imshow(style_np)ax2.set_title("风格图像")ax2.axis('off')self.canvas.draw()def generate_and_display(self):"""生成并显示风格化图像"""if self.content_image is None or self.style_image is None:tk.messagebox.showwarning("警告", "请先选择内容图像和风格图像！")return# 生成风格化图像stylized_image = self.generate_stylized_image(self.content_image, self.style_image)# 显示结果self.fig.clear()# 创建3x1布局ax1 = self.fig.add_subplot(1, 3, 1)content_np = tf.squeeze(self.content_image).numpy()ax1.imshow(content_np)ax1.set_title("内容图像")ax1.axis('off')ax2 = self.fig.add_subplot(1, 3, 2)style_np = tf.squeeze(self.style_image).numpy()ax2.imshow(style_np)ax2.set_title("风格图像")ax2.axis('off')ax3 = self.fig.add_subplot(1, 3, 3)stylized_np = tf.squeeze(stylized_image).numpy()ax3.imshow(stylized_np)ax3.set_title("风格迁移结果")ax3.axis('off')self.fig.tight_layout()self.canvas.draw()# 询问是否保存结果if tk.messagebox.askyesno("保存", "是否保存生成的图像？"):self.save_result(stylized_image)def save_result(self, image):"""保存生成的图像"""file_path = filedialog.asksaveasfilename(defaultextension=".jpg",filetypes=[("JPEG文件", "*.jpg"), ("PNG文件", "*.png")])if file_path:# 转换为PIL图像并保存image = tf.squeeze(image).numpy()image = (image * 255).astype(np.uint8)Image.fromarray(image).save(file_path)tk.messagebox.showinfo("成功", f"图像已保存至: {file_path}")# 运行应用
if __name__ == "__main__":app = ArtisticImageGenerator()app.create_gui()