当前位置：首页 > news >正文

仿射变换与透视变换

news 2025/11/3 8:28:53

仿射变换与透视变换

几种变换之间的关系

1、缩放 Rescale

1）变换矩阵

缩放变换矩阵，形为： $S = \begin{bmatrix} s_x & 0 \\ 0 & s_y \\ \end{bmatrix}$ ，其中： $s_x$ 、 $s_y$ 为 x轴和 y轴的缩放因子，即宽高的缩放因子

图像中的每一个像素点 (x, y)，经过矩阵变换（如下公式），会落到的新的位置 $(s_x \cdot x, \;\; s_y \cdot y)$ ： $\begin{bmatrix} s_x & 0 \\ 0 & s_y \\ \end{bmatrix} \begin{bmatrix} x \\ y \\ \end{bmatrix} = \begin{bmatrix} s_x \cdot x \\ s_y \cdot y \\ \end{bmatrix}$

2）举例

比如，一张图像尺寸为 $x_0 \times y_0$ ，经过变换矩阵 $S = \begin{bmatrix} 0.8 & 0 \\ 0 & 0.6 \\ \end{bmatrix}$ 变换之后：

其右下角点（ $x_0$ , $y_0$ ），落在新的位置 ( $0.8x_0$ ， $0.6y_0$ ）
图像中的其他点，也会经历同样的变换
对于图像原点 (0, 0)，缩放之后，仍为原点 (0, 0)

3）代码演示

注意：图像默认的原点位置为图像的左上角点

import cv2
import numpy as npsrc_img = cv2.imread("img.jpg")
height, width = src_img.shape[:2]M = np.eye(2, 3)
M[0, 0] = 0.8
M[1, 1] = 0.6dst_img = cv2.warpAffine(src_img, M, (width, height))cv2.imshow("src_img", src_img)
cv2.imshow("dst_img", dst_img)
cv2.waitKey()
cv2.destroyAllWindows()

2、旋转 Rotation

1）变换矩阵

旋转变换矩阵，形为： $R = \begin{bmatrix} cos \theta & -sin \theta \\ sin \theta & cos \theta \\ \end{bmatrix}$ ，其中： $\theta$ 为顺时针旋转角度

图像中的每一个像素点 (x, y)，经过矩阵变换（如下公式），会落到的新的位置 ( $cos \theta \cdot x - sin \theta \cdot y, \;\; sin \theta \cdot x + cos \theta \cdot y$ )

$\begin{bmatrix} cos \theta & -sin \theta \\ sin \theta & cos \theta \\ \end{bmatrix} \begin{bmatrix} x \\ y \\ \end{bmatrix} = \begin{bmatrix} cos \theta \cdot x - sin \theta \cdot y\\ sin \theta \cdot x + cos \theta \cdot y\\ \end{bmatrix}$

2）举例

比如，一张图像尺寸为 $x_0 \times y_0$ ，旋转 $15^{\circ}$ ，变换矩阵为： $R = \begin{bmatrix} cos(\frac{\pi}{12}) & -sin(\frac{\pi}{12}) \\ sin(\frac{\pi}{12}) & cos(\frac{\pi}{12}) \\ \end{bmatrix}$ ，变换之后：

其右上角点( $x_0, 0$ )，落在新的位置 $(cos\frac{\pi}{12} \cdot x_0, \;\; sin\frac{\pi}{12} \cdot x_0)$
图像中的其他点，也会经历同样的变换
对于图像原点 (0, 0)，旋转之后，仍为原点 (0, 0)

3）代码演示

import cv2
import numpy as npsrc_img = cv2.imread("img.jpg")
height, width = src_img.shape[:2]M = np.eye(2, 3)
M[0, 0] = np.cos(np.pi / 12)
M[0, 1] = -np.sin(np.pi / 12)
M[1, 0] = np.sin(np.pi / 12)
M[1, 1] = np.cos(np.pi / 12)dst_img = cv2.warpAffine(src_img, M, (width, height))cv2.imshow("src_img", src_img)
cv2.imshow("dst_img", dst_img)
cv2.waitKey()
cv2.destroyAllWindows()

3、切变 Shear

1）变换矩阵

切变（Shear）：切变通过在坐标系中，斜向拉伸对象来改变其形状，但不会改变对象的大小或旋转它，也不会改变原点位置。

切变矩阵如下图所示：

2）举例

比如，一张图像尺寸为 $x_0 \times y_0$ ，沿x轴拉伸 $15^{\circ}$ ，沿y轴不拉伸，则变换矩阵为 $S = \begin{bmatrix} 1 & \tan{\frac{\pi}{12}} \\ 0 & 1 \\ \end{bmatrix}$ ，变换之后：

其右下角点 $(x_0, \ y_0)$ ，落在新的位置 $(x_0 + y_0\cdot \tan{\frac{\pi}{12}}, \;\; y_0)$
图像中的其他点，也会经历同样的变换
对于图像原点 (0, 0)，切变之后，仍为原点 (0, 0)

3）代码演示

import cv2
import numpy as npsrc_img = cv2.imread("img.jpg")
height, width = src_img.shape[:2]M = np.eye(2, 3)
M[0, 1] = np.tan(np.pi / 12)dst_img = cv2.warpAffine(src_img, M, (width, height))cv2.imshow("src_img", src_img)
cv2.imshow("dst_img", dst_img)
cv2.waitKey()
cv2.destroyAllWindows()

总结一：线性变换

1、缩放、旋转、切变都属于线性变换，他们的特点有以下：

变换之前的原点（0， 0），在变换之后，仍为原点（0， 0）
平直性：变换之前的直线，在变换之后仍为直线
平行性：变换之前的平行线，在变换之后，仍为平行线

2、在二维空间内，线性变换（缩放、旋转、切变）都可以通过 2x2 的变换矩阵，来实现相关的变换计算

4、平移 Translation

1）变换矩阵

之前我们介绍的缩放、旋转、切变都可以通过 2x2 的变换矩阵直接计算得到，而平移则不行。

平移如果用公式表示的话，如下：

$\begin{bmatrix} x \\ y \\ \end{bmatrix} + \begin{bmatrix} t_x \\ t_y \\ \end{bmatrix} = \begin{bmatrix} x + t_x\\ y + t_y \\ \end{bmatrix}$ 或 $\begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} \begin{bmatrix} x \\ y \\ \end{bmatrix} + \begin{bmatrix} t_x \\ t_y \\ \end{bmatrix} = \begin{bmatrix} x + t_x\\ y + t_y \\ \end{bmatrix}$

齐次坐标 (Homogeneous Coordinates)

由上可知，平移变换不能通过 2x2 的转换矩阵直接进行矩阵乘法得到，所以，我们引入了 “齐次坐标 (Homogeneous Coordinates)”。在二维情况下，齐次坐标通常由三个值表示，即(x,y,w)，其中：

x 和 y 是普通的笛卡尔坐标
w是一个额外的参数，通常设置为 1 。也可以理解为，增加了一个维度 z，只不过对象（图像）在这个维度上的值恒为1，也就是在 z=1 这个平面上。

这样，我们就可以将平移变换，写成如下公式表达：

$\begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \\ \end{bmatrix} = \begin{bmatrix} x + t_x\\ y + t_y \\ \end{bmatrix}$

缩放、旋转、切变的变换矩阵也可以拓展为2x3矩阵，比如，旋转变换可以表示为：

$\begin{bmatrix} cos \theta & -sin \theta & 0\\ sin \theta & cos \theta & 0\\ \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \\ \end{bmatrix} = \begin{bmatrix} cos \theta \cdot x - sin \theta \cdot y\\ sin \theta \cdot x + cos \theta \cdot y\\ \end{bmatrix}$

2）代码演示

import cv2
import numpy as npsrc_img = cv2.imread("img.jpg")
height, width = src_img.shape[:2]M = np.eye(2, 3)
M[0, 2] = 20
M[1, 2] = 40dst_img = cv2.warpAffine(src_img, M, (width, height))cv2.imshow("src_img", src_img)
cv2.imshow("dst_img", dst_img)
cv2.waitKey()
cv2.destroyAllWindows()

5、仿射变换

1）仿射变换

仿射变换为缩放、旋转、切变、平移 4种变换的任意组合

比如，下面就是 “缩放 + 旋转 + 平移” 的组合，因为缩放、旋转都是相对于原点坐标来操作的，为了保证图像增强中，不会出现意外的结果，一般会将平移操作放在最后，即先线性后平移

仿射变换有以下特点：

变换之前的直线，在变换之后仍为直线
变换之前的平行线，在变换之后，仍为平行线

2）举例

我们以“缩放 + 旋转 + 平移” 的组合为例，对图像中的点 $P = (x, y, 1)^T$ 进行连续变换：

先做缩放变换，缩放变换矩阵记为 S
再做旋转变换，旋转变换矩阵记为 R
最后做平移变换，平移变换矩阵记为 T

变换后 P点的新位置为P' ，整体的变换表达公式为： $P' = T \cdot (R \cdot (S\cdot P))$

1）为了使 $P' = T \cdot (R \cdot (S\cdot P))$ 可以矩阵连乘，S、R、T 都要拓展为 3x3 的变换矩阵：

S 形为： $\begin{bmatrix} a_{11} & 0 & 0\\ 0 & a_{22} & 0\\ 0 & 0 & 1\\ \end{bmatrix}$
R 形为： $\begin{bmatrix} a_{11} & a_{12} & 0\\ a_{21} & a_{22} & 0\\ 0 & 0 & 1\\ \end{bmatrix}$
T形为： $\begin{bmatrix} 1 & 0 & t_x\\ 0 & 1 & t_y\\ 0 & 0 & 1\\ \end{bmatrix}$

2）因为矩阵乘法满足乘法结合律 $(AB)C=A(BC)$ ，所以，上面的变换公式可写为： $P' = T \cdot R \cdot S\cdot P$

我们令 $M= T \cdot R \cdot S$ ，即 $M$ 为整体的变换矩阵， $P' = M \cdot P$

3）变换矩阵的位置不可更换（因为矩阵乘法不满足交换律 $ $AB=BA$ ），变换矩阵的顺序决定着变换的操作顺序

尤其是平移变换，因为平移变换始终都是最后做，所以，平移变换矩阵的位置始终都在第一个位置

yolov5 中相关代码示例：（其中 C 表示将坐标原点从图像左上角点转换到图像中心点）

2）代码演示

import cv2
import numpy as npsrc_img = cv2.imread("img.jpg")
height, width = src_img.shape[:2]# 旋转矩阵
S = np.eye(3, 3)
S[0, 0] = 0.8
S[1, 1] = 0.6# 旋转矩阵
R = np.eye(3, 3)
R[0, 0] = np.cos(np.pi / 12)
R[0, 1] = -np.sin(np.pi / 12)
R[1, 0] = np.sin(np.pi / 12)
R[1, 1] = np.cos(np.pi / 12)# 平移矩阵
T = np.eye(3, 3)
T[0, 2] = 40
T[1, 2] = 20# 仿射变换矩阵
M = T @ R @ S# 操作变换
dst_img = cv2.warpAffine(src_img, M[:2], (width, height))cv2.imshow("src_img", src_img)
cv2.imshow("dst_img", dst_img)
cv2.waitKey()
cv2.destroyAllWindows()

6、透视变换 Perspective

仿射变换可以将矩形图片映射为平行四边形，而透视变换可以将矩形图片映射为任意四边形

1）变换矩阵

透视变换矩阵，形为： $P= \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ a_{31} & a_{32} & 1 \\ \end{bmatrix}$

透视变换分为 2 步：

对于透视变换以上2步的理解：

对于齐次坐标 (x, y, z) ，我们是增加了一个维度 z （这时，z = 1），原始图像是在 z = 1 这个平面上的

第一步，我们将图像，根据变换矩阵，投射到了三维空间中，黄色图像为结果图像，图像中像素点的坐标为 $(x, \;y,\; a_{31}x + a_{32}y + 1)$
第二步，再将三维空间上的点，给映射回 $z = 1$ 平面上。这种映射是基于视觉原理的映射，这时人的视线为 z轴，将第1步得到的结果图像，往 $z = 1$ 平面上进行映射，示意图如下：

2）代码演示

import cv2
import numpy as npsrc_img = cv2.imread("img.jpg")
height, width = src_img.shape[:2]M = np.eye(3, 3)
M[2, 0] = 0.0002
M[2, 1] = 0.0002dst_img = cv2.warpPerspective(src_img, M, (width, height))cv2.imshow("src_img", src_img)
cv2.imshow("dst_img", dst_img)
cv2.waitKey()
cv2.destroyAllWindows()