线性代数 · SVD | 导数
注:本文为 “线性代数 · SVD | 导数” 相关英文引文,机翻未校。
Singular value decomposition derivatives
奇异值分解的导数
Published Nov 10, 2023
The singular value decomposition (SVD) is a matrix decomposition that is used in many applications. It is defined as:
奇异值分解(Singular Value Decomposition, SVD)是一种在众多领域中均有应用的矩阵分解方法,其定义如下:
J=UΣVT\begin{align*} J &= U \Sigma V^T \end{align*}J=UΣVT
where UUU and VVV are orthogonal matrices and Σ\SigmaΣ is a diagonal matrix with non-negative entries. The diagonal entries of Σ\SigmaΣ are called the singular values of JJJ and are denoted as σ1,…,σn\sigma_1,\dots,\sigma_nσ1,…,σn. The singular values are the square roots of the eigenvalues of JTJJ^TJJTJ. In this post we’re going to go over how to differentiate the elements of the SVD under the assumption that the singular values are all distinct and non-zero.
其中,UUU 和 VVV 为正交矩阵,Σ\SigmaΣ 为对角元非负的对角矩阵。Σ\SigmaΣ 的对角元被称为矩阵 JJJ 的奇异值,记为 σ1,…,σn\sigma_1,\dots,\sigma_nσ1,…,σn。奇异值是矩阵 JTJJ^TJJTJ 特征值的平方根。本文将在“所有奇异值互不相同且非零”的假设下,详细推导奇异值分解各元素的导数。
Einstein notation
爱因斯坦求和符号
Before proceeding, we need to understand Einstein notation. Einstein notation is an alternative way of writing matrix equations where we undo the matrix notation and remove the summation symbol. For example, lets write the equation Ax=bAx = bAx=b in Einstein notation.
在进行后续推导前,需先理解爱因斯坦求和符号(Einstein notation)。该符号是矩阵方程的一种替代表示方法,核心是“拆解矩阵形式并省略求和符号”。以方程 Ax=bAx = bAx=b 为例,其爱因斯坦求和符号表示过程如下:
-
Step 1: Undo the matrix notation 拆解矩阵形式
Ax=b→∑jJijxj=biAx = b \to \sum_j J_{ij}x_j = b_iAx=b→∑jJijxj=bi -
Step 2: Remove the summation symbol 省略求和符号
∑jJijxj=bi→Jijxj=bi\sum_j J_{ij}x_j = b_i \to J_{ij}x_j = b_i∑jJijxj=bi→Jijxj=bi
And thats it! All we did was remove the summation symbol. When we see Einstein notation in practice, we implicitly assume that there is a summation over indices that only appear on one side of the equality. Also in Einstein notation, we will make use of the Kronecker delta function δij\delta_{ij}δij which is 111 when i=ji=ji=j and 000 otherwise.
推导至此即可!整个过程的核心就是省略求和符号。在实际使用爱因斯坦求和符号时,我们默认:对“仅在等式一侧出现的指标”进行求和(即“哑标求和”规则)。此外,爱因斯坦求和符号中还会用到克罗内克δ函数(Kronecker delta function)δij\delta_{ij}δij,其定义为:当 i=ji=ji=j 时,δij=1\delta_{ij}=1δij=1;当 i≠ji \neq ji=j 时,δij=0\delta_{ij}=0δij=0。
Orthogonal matrices
正交矩阵
Next, we need to know how to differentiate orthogonal matrices. Let QQQ be an orthogonal matrix, then by definition QkiQkj=δijQ_{ki}Q_{kj} = \delta_{ij}QkiQkj=δij. Taking the derivative yields:
接下来,需推导正交矩阵的导数。设 QQQ 为正交矩阵,根据正交矩阵的定义,有 QkiQkj=δijQ_{ki}Q_{kj} = \delta_{ij}QkiQkj=δij。对等式两侧求导可得:
∂(QkiQkj)=∂δij⟹∂QkiQkj+Qki∂Qkj=0⟹∂QkiQkj=−Qki∂Qkj\begin{align*} \partial (Q_{ki}Q_{kj}) &= \partial \delta_{ij} \\ \implies \partial Q_{ki} Q_{kj} + Q_{ki} \partial Q_{kj} &= 0 \\ \implies \partial Q_{ki} Q_{kj} &= -Q_{ki} \partial Q_{kj} \\ \end{align*}∂(QkiQkj)⟹∂QkiQkj+Qki∂Qkj⟹∂QkiQkj=∂δij=0=−Qki∂Qkj
To make this equations clearer, we can undo some of the Einstein notation by letting qi:=Q:,iq_i:= Q_{:,i}qi:=Q:,i be the iiith column of QQQ. Then we have:
为使等式更清晰,可部分还原爱因斯坦求和符号的矩阵形式:令 qi:=Q:,iq_i := Q_{:,i}qi:=Q:,i(即 qiq_iqi 为矩阵 QQQ 的第 iii 列),此时等式可改写为:
∂qi⋅qj=−∂qj⋅qi\begin{align*} \partial q_i \cdot q_j = -\partial q_j \cdot q_i \end{align*}∂qi⋅qj=−∂qj⋅qi
Note that when i=ji=ji=j, ∂qi⋅qi=0\partial q_i \cdot q_i = 0∂qi⋅qi=0. This will be useful when we differentiate the SVD.
需注意,当 i=ji=ji=j 时,有 ∂qi⋅qi=0\partial q_i \cdot q_i = 0∂qi⋅qi=0。该结论在后续奇异值分解的求导过程中会发挥重要作用。
Singular value decomposition derivatives
奇异值分解的导数
Lets start by writing the SVD using Einstein notation
首先,用爱因斯坦求和符号表示奇异值分解:
Jij=UiuσuVju\begin{align*} J_{ij} &= U_{iu} \sigma_u V_{ju} \\ \end{align*}Jij=UiuσuVju
with some term rearrangement, we can write two equations:
通过调整项的顺序,可得到以下两个等式:
JijUiu=σuVjuJijVju=σuUiu\begin{align*} J_{ij}U_{iu} &= \sigma_u V_{ju} \\ J_{ij}V_{ju} &= \sigma_u U_{iu} \end{align*}JijUiuJijVju=σuVju=σuUiu
by applying a derivative, we get
对上述两个等式两侧分别求导,可得:
∂JijUiu+Jij∂Uiu=∂σuVju+σu∂Vju∂JijVju+Jij∂Vju=∂σuUiu+σu∂Uiu\begin{align*} \partial J_{ij}U_{iu} + J_{ij}\partial U_{iu} &= \partial \sigma_u V_{ju} + \sigma_u \partial V_{ju} \\ \partial J_{ij}V_{ju} + J_{ij}\partial V_{ju} &= \partial \sigma_u U_{iu} + \sigma_u \partial U_{iu} \end{align*}∂JijUiu+Jij∂Uiu∂JijVju+Jij∂Vju=∂σuVju+σu∂Vju=∂σuUiu+σu∂Uiu
We’ll call the first equation, equation 1 and the second equation, equation 2.
我们将第一个等式称为“等式 1”,第二个等式称为“等式 2”。
Singular value derivaties
奇异值的导数
To get the derivatives of the singular values, we can multiply both sides of equation 2 by UiuU_{iu}Uiu and summing over iii:
为推导奇异值的导数,可将等式 2 两侧同时乘以 UiuU_{iu}Uiu,并对指标 iii 求和:
∂JijVjuUiu+Jij∂VjuUiu⏟σuVju∂Vju=0=∂σuUiuUiu⏟1+σu∂UiuUiu⏟0⟹∂σu=∂JijUiuVju\begin{align*} \partial J_{ij}V_{ju} U_{iu} + \underbrace{J_{ij}\partial V_{ju} U_{iu}}_{\sigma_u V_{ju} \partial V_{ju}=0} &= \partial \sigma_u \underbrace{U_{iu} U_{iu}}_{1} + \sigma_u \underbrace{\partial U_{iu} U_{iu}}_{0} \\ \implies \partial \sigma_u &= \partial J_{ij}U_{iu} V_{ju} \end{align*}∂JijVjuUiu+σuVju∂Vju=0Jij∂VjuUiu⟹∂σu=∂σu1UiuUiu+σu0∂UiuUiu=∂JijUiuVju
注:推导中用到两个结论:
- 由 JijVju=σuUiuJ_{ij}V_{ju} = \sigma_u U_{iu}JijVju=σuUiu 得 Jij∂VjuUiu=σuVju∂VjuJ_{ij}\partial V_{ju} U_{iu} = \sigma_u V_{ju} \partial V_{ju}Jij∂VjuUiu=σuVju∂Vju,再结合正交矩阵求导结论 ∂VjuVju=0\partial V_{ju} V_{ju}=0∂VjuVju=0,故该项为 0;
- 正交矩阵列向量单位性:UiuUiu=1U_{iu} U_{iu}=1UiuUiu=1,且 ∂UiuUiu=0\partial U_{iu} U_{iu}=0∂UiuUiu=0)
Singular vector derivatives
奇异向量的导数
Next, to isolate the derivatives of the singular vecotrs, we’ll first multiply both sides of equation 1 by VjvV_{jv}Vjv, where v≠uv \neq uv=u and sum over jjj:
接下来推导奇异向量的导数。首先,将等式 1 两侧同时乘以 VjvV_{jv}Vjv(其中 v≠uv \neq uv=u),并对指标 jjj 求和:
∂JijUiuVjv+Jij∂UiuVjv⏟σv∂UiuUiv=∂σuVjuVjv⏟0+σu∂VjuVjv∂JijUiuVjv=−σv∂UiuUiv+σu∂VjuVjv\begin{align*} \partial J_{ij}U_{iu} V_{jv} + \underbrace{J_{ij}\partial U_{iu} V_{jv}}_{\sigma_v \partial U_{iu} U_{iv}} &= \partial \sigma_u \underbrace{V_{ju} V_{jv}}_{0} + \sigma_u \partial V_{ju} V_{jv} \\ \partial J_{ij}U_{iu} V_{jv} &= -\sigma_v \partial U_{iu} U_{iv} + \sigma_u \partial V_{ju} V_{jv} \end{align*}∂JijUiuVjv+σv∂UiuUivJij∂UiuVjv∂JijUiuVjv=∂σu0VjuVjv+σu∂VjuVjv=−σv∂UiuUiv+σu∂VjuVjv
注:
- 由 JijVjv=σvUivJ_{ij}V_{jv} = \sigma_v U_{iv}JijVjv=σvUiv 得 Jij∂UiuVjv=σv∂UiuUivJ_{ij}\partial U_{iu} V_{jv} = \sigma_v \partial U_{iu} U_{iv}Jij∂UiuVjv=σv∂UiuUiv;
- 正交矩阵列向量正交性:VjuVjv=0V_{ju} V_{jv}=0VjuVjv=0(因 v≠uv \neq uv=u))
Similarly we can do the same with equation 2 but multiply by UivU_{iv}Uiv where v≠uv\neq uv=u and sum over iii:
类似地,对等式 2 进行操作:将其两侧同时乘以 UivU_{iv}Uiv(其中 v≠uv \neq uv=u),并对指标 iii 求和:
∂JijVjuUiv+Jij∂VjuUiv⏟σv∂VjuVjv=∂σuUiuUiv⏟0+σu∂UiuUiv∂JijUivVju=σu∂UiuUiv−σv∂VjuVjv\begin{align*} \partial J_{ij}V_{ju} U_{iv} + \underbrace{J_{ij}\partial V_{ju} U_{iv}}_{\sigma_v \partial V_{ju} V_{jv}} &= \partial \sigma_u \underbrace{U_{iu} U_{iv}}_{0} + \sigma_u \partial U_{iu} U_{iv} \\ \partial J_{ij}U_{iv} V_{ju} &= \sigma_u \partial U_{iu} U_{iv} - \sigma_v \partial V_{ju} V_{jv} \end{align*}∂JijVjuUiv+σv∂VjuVjvJij∂VjuUiv∂JijUivVju=∂σu0UiuUiv+σu∂UiuUiv=σu∂UiuUiv−σv∂VjuVjv
注:
- 由 JijUiv=σvVjvJ_{ij}U_{iv} = \sigma_v V_{jv}JijUiv=σvVjv 得 Jij∂VjuUiv=σv∂VjuVjvJ_{ij}\partial V_{ju} U_{iv} = \sigma_v \partial V_{ju} V_{jv}Jij∂VjuUiv=σv∂VjuVjv;
- 正交矩阵列向量正交性:UiuUiv=0U_{iu} U_{iv}=0UiuUiv=0(因 v≠uv \neq uv=u))
So we’re left with the equation
综上,可得到如下方程组:
∂JijUiuVjv=−σv∂UiuUiv+σu∂VjuVjv∂JijUivVju=σu∂UiuUiv−σv∂VjuVjv\begin{align*} \partial J_{ij}U_{iu} V_{jv} &= -\sigma_v \partial U_{iu} U_{iv} + \sigma_u \partial V_{ju} V_{jv} \\ \partial J_{ij}U_{iv} V_{ju} &= \sigma_u \partial U_{iu} U_{iv} - \sigma_v \partial V_{ju} V_{jv} \end{align*}∂JijUiuVjv∂JijUivVju=−σv∂UiuUiv+σu∂VjuVjv=σu∂UiuUiv−σv∂VjuVjv
Left singular vectors
左奇异向量
Lets multiply the above equations by σv\sigma_vσv and σu\sigma_uσu respectively:
将上述方程组的第一个等式乘以 σv\sigma_vσv,第二个等式乘以 σu\sigma_uσu,可得:
σv∂JijUiuVjv=−σv2∂UiuUiv+σvσu∂VjuVjvσu∂JijUivVju=σu2∂UiuUiv−σvσu∂VjuVjv\begin{align*} \sigma_v \partial J_{ij}U_{iu} V_{jv} &= -{\sigma_v}^2 \partial U_{iu} U_{iv} + \sigma_v \sigma_u \partial V_{ju} V_{jv} \\ \sigma_u \partial J_{ij}U_{iv} V_{ju} &= {\sigma_u}^2 \partial U_{iu} U_{iv} - \sigma_v \sigma_u \partial V_{ju} V_{jv} \end{align*}σv∂JijUiuVjvσu∂JijUivVju=−σv2∂UiuUiv+σvσu∂VjuVjv=σu2∂UiuUiv−σvσu∂VjuVjv
If we sum the equations, the last terms cancel and we’re left with
将两个等式相加,右侧的交叉项(σvσu∂VjuVjv\sigma_v \sigma_u \partial V_{ju} V_{jv}σvσu∂VjuVjv 与 −σvσu∂VjuVjv-\sigma_v \sigma_u \partial V_{ju} V_{jv}−σvσu∂VjuVjv)相互抵消,最终得到:
∂Jij(σuUivVju+σvUiuVjv)=(σu2−σv2)∂UiuUiv⟹∂UiuUiv=1σu2−σv2∂Jij(σuUivVju+σvUiuVjv)\begin{align*} \partial J_{ij}\left(\sigma_u U_{iv} V_{ju} + \sigma_v U_{iu} V_{jv}\right) &= ({\sigma_u}^2 - {\sigma_v}^2) \partial U_{iu} U_{iv} \\ \implies \partial U_{iu} U_{iv} &= \frac{1}{\sigma_u^2 - \sigma_v^2} \partial J_{ij}\left(\sigma_u U_{iv} V_{ju} + \sigma_v U_{iu} V_{jv}\right) \end{align*}∂Jij(σuUivVju+σvUiuVjv)⟹∂UiuUiv=(σu2−σv2)∂UiuUiv=σu2−σv21∂Jij(σuUivVju+σvUiuVjv)
Right singular vectors
右奇异向量
Similarly, if we multiplied by σu\sigma_uσu and σv\sigma_vσv respectively, we get
类似地,将原方程组的第一个等式乘以 σu\sigma_uσu,第二个等式乘以 σv\sigma_vσv,可得:
σu∂JijUiuVjv=−σvσu∂UiuUiv+σu2∂VjuVjvσv∂JijUivVju=σvσu∂UiuUiv−σv2∂VjuVjv\begin{align*} \sigma_u \partial J_{ij}U_{iu} V_{jv} &= -{\sigma_v} \sigma_u \partial U_{iu} U_{iv} + {\sigma_u}^2 \partial V_{ju} V_{jv} \\ \sigma_v\partial J_{ij}U_{iv} V_{ju} &= \sigma_v\sigma_u \partial U_{iu} U_{iv} - {\sigma_v}^2 \partial V_{ju} V_{jv} \end{align*}σu∂JijUiuVjvσv∂JijUivVju=−σvσu∂UiuUiv+σu2∂VjuVjv=σvσu∂UiuUiv−σv2∂VjuVjv
If we sum the equations, the first terms on the RHS cancel and we’re left with
将两个等式相加,右侧的交叉项(−σvσu∂UiuUiv-\sigma_v \sigma_u \partial U_{iu} U_{iv}−σvσu∂UiuUiv 与 σvσu∂UiuUiv\sigma_v \sigma_u \partial U_{iu} U_{iv}σvσu∂UiuUiv)相互抵消,最终得到:
∂Jij(σvUivVju+σuUiuVjv)=(σu2−σv2)∂VjuVjv⟹∂VjuVjv=1σu2−σv2∂Jij(σvUivVju+σuUiuVjv)\begin{align*} \partial J_{ij}\left(\sigma_v U_{iv} V_{ju} + \sigma_u U_{iu} V_{jv}\right) &= ({\sigma_u}^2 - {\sigma_v}^2) \partial V_{ju} V_{jv} \\ \implies \partial V_{ju} V_{jv} &= \frac{1}{\sigma_u^2 - \sigma_v^2} \partial J_{ij}\left(\sigma_v U_{iv} V_{ju} + \sigma_u U_{iu} V_{jv}\right) \end{align*}∂Jij(σvUivVju+σuUiuVjv)⟹∂VjuVjv=(σu2−σv2)∂VjuVjv=σu2−σv21∂Jij(σvUivVju+σuUiuVjv)
Summary
总结
To simplify the expressions, we’ll use the notation Ui:=U:,iU_i := U_{:,i}Ui:=U:,i and Vi:=V:,iV_i := V_{:,i}Vi:=V:,i to denote the iiith column of UUU and VVV respectively. Then returning to matrix notation yields:
为简化表达式,定义符号 Ui:=U:,iU_i := U_{:,i}Ui:=U:,i(UiU_iUi 为矩阵 UUU 的第 iii 列)和 Vi:=V:,iV_i := V_{:,i}Vi:=V:,i(ViV_iVi 为矩阵 VVV 的第 iii 列),并还原为矩阵形式,可得:
∂σu=∂JijUiuVju∂Uu⋅Uv≠u=1σu2−σv2∂Jij(σuUivVju+σvUiuVjv)∂Vu⋅Vv=1σu2−σv2∂Jij(σvUivVju+σuUiuVjv)\begin{align*} \partial \sigma_u &= \partial J_{ij}U_{iu} V_{ju} \\ \partial U_u \cdot U_{v\neq u} &= \frac{1}{\sigma_u^2 - \sigma_v^2} \partial J_{ij}\left(\sigma_u U_{iv} V_{ju} + \sigma_v U_{iu} V_{jv}\right) \\ \partial V_u \cdot V_v &= \frac{1}{\sigma_u^2 - \sigma_v^2} \partial J_{ij}\left(\sigma_v U_{iv} V_{ju} + \sigma_u U_{iu} V_{jv}\right) \end{align*}∂σu∂Uu⋅Uv=u∂Vu⋅Vv=∂JijUiuVju=σu2−σv21∂Jij(σuUivVju+σvUiuVjv)=σu2−σv21∂Jij(σvUivVju+σuUiuVjv)
Note that to isolate the derivatives of UUU and VVV, we can write them as a linear combination of the singular vectors:
需注意,若要单独表示 UUU 和 VVV 的导数,可将其表示为奇异向量的线性组合:
∂Uu=(∂Uu⋅Uv≠u)Uu∂Vu=(∂Vu⋅Vv≠u)Vu\begin{align*} \partial U_u = (\partial U_u \cdot U_{v\neq u}) U_u \\ \partial V_u = (\partial V_u \cdot V_{v\neq u}) V_u \end{align*}∂Uu=(∂Uu⋅Uv=u)Uu∂Vu=(∂Vu⋅Vv=u)Vu
Because UUU and VVV are orthogonal, ∂Uu⋅Uu=∂Vu⋅Vu=0\partial U_u \cdot U_u = \partial V_u \cdot V_u = 0∂Uu⋅Uu=∂Vu⋅Vu=0.
这是因为 UUU 和 VVV 均为正交矩阵,根据正交矩阵求导结论,有 ∂Uu⋅Uu=∂Vu⋅Vu=0\partial U_u \cdot U_u = \partial V_u \cdot V_u = 0∂Uu⋅Uu=∂Vu⋅Vu=0(即奇异向量的导数与其自身正交)。
Time derivative
时间导数
We can also see how the singular vectors and singular values evolve when we flow on the vector field:
当矩阵 JJJ 随向量场 dxtdt=Xt(xt)\frac{dx_t}{dt} = X_t(x_t)dtdxt=Xt(xt) 演化时,我们还可推导奇异向量和奇异值的时间导数:
dxtdt=Xt(xt)\begin{align*} \frac{dx_t}{dt} = X_t(x_t) \end{align*}dtdxt=Xt(xt)
To do this, recall that we can write the time derivative of the components of JJJ as:
首先,回顾矩阵 JJJ 各元素的时间导数表达式:
dJdt=∇XtJ\begin{align*} \frac{dJ}{dt} = \nabla X_t J \end{align*}dtdJ=∇XtJ
Then we can look at the time derivative of the SVD derivatives.
基于此,可进一步推导奇异值分解各元素的时间导数。
Singular value derivatives
奇异值的时间导数
dσudt=dJijdtUiuVju=(∇Xt)ikJkjUiuVju=(∇Xt)ikUiuσuUku\begin{align*} \frac{d\sigma_u}{dt} &= \frac{dJ_{ij}}{dt}U_{iu} V_{ju} \\ &= (\nabla X_t)_{ik} J_{kj} U_{iu} V_{ju} \\ &= (\nabla X_t)_{ik} U_{iu} \sigma_u U_{ku} \\ \end{align*}dtdσu=dtdJijUiuVju=(∇Xt)ikJkjUiuVju=(∇Xt)ikUiuσuUku
This is more simply expressed using the log of the singular values:
若对奇异值取对数,表达式可进一步简化:
dlogσudt=(∇Xt)ikUiuUku\begin{align*} \frac{d\log \sigma_u}{dt} = (\nabla X_t)_{ik} U_{iu} U_{ku} \end{align*}dtdlogσu=(∇Xt)ikUiuUku
注:对 dσudt=(∇Xt)ikUiuσuUku\frac{d\sigma_u}{dt} = (\nabla X_t)_{ik} U_{iu} \sigma_u U_{ku}dtdσu=(∇Xt)ikUiuσuUku 两侧同时除以 σu\sigma_uσu,利用 1σudσudt=dlogσudt\frac{1}{\sigma_u}\frac{d\sigma_u}{dt} = \frac{d\log \sigma_u}{dt}σu1dtdσu=dtdlogσu,即可得到上述对数形式的时间导数,该形式在分析奇异值的相对变化率时更便捷。)
Left singular vector derivatives
左奇异向量的时间导数
dUudt⋅Uv≠u=1σu2−σv2dJijdt(σuUivVju+σvUiuVjv)=1σu2−σv2(∇Xt)ikJkj(σuUivVju+σvUiuVjv)=1σu2−σv2(∇Xt)ik(σu2UivUku+σv2UiuUkv)=σu2σu2−σv2UvT(∇Xt)Uu+σv2σu2−σv2UuT(∇Xt)Uv=UvT(σu2σu2−σv2∇Xt+σv2σu2−σv2∇XtT)Uu\begin{align*} \frac{dU_u}{dt} \cdot U_{v\neq u} &= \frac{1}{\sigma_u^2 - \sigma_v^2} \frac{dJ_{ij}}{dt}\left(\sigma_u U_{iv} V_{ju} + \sigma_v U_{iu} V_{jv}\right) \\ &= \frac{1}{\sigma_u^2 - \sigma_v^2} (\nabla X_t)_{ik} J_{kj}\left(\sigma_u U_{iv} V_{ju} + \sigma_v U_{iu} V_{jv}\right) \\ &= \frac{1}{\sigma_u^2 - \sigma_v^2} (\nabla X_t)_{ik}\left(\sigma_u^2 U_{iv} U_{ku} + \sigma_v^2 U_{iu} U_{kv}\right) \\ &= \frac{\sigma_u^2}{\sigma_u^2 - \sigma_v^2} U_v^T(\nabla X_t)U_u + \frac{\sigma_v^2}{\sigma_u^2 - \sigma_v^2} U_u^T(\nabla X_t)U_v \\ &= U_v^T\left(\frac{\sigma_u^2}{\sigma_u^2 - \sigma_v^2}\nabla X_t + \frac{\sigma_v^2}{\sigma_u^2 - \sigma_v^2} \nabla X_t^T\right)U_u \end{align*}dtdUu⋅Uv=u=σu2−σv21dtdJij(σuUivVju+σvUiuVjv)=σu2−σv21(∇Xt)ikJkj(σuUivVju+σvUiuVjv)=σu2−σv21(∇Xt)ik(σu2UivUku+σv2UiuUkv)=σu2−σv2σu2UvT(∇Xt)Uu+σu2−σv2σv2UuT(∇Xt)Uv=UvT(σu2−σv2σu2∇Xt+σu2−σv2σv2∇XtT)Uu
注:
- 将“奇异向量导数公式”中的 ∂\partial∂ 替换为时间导数 ddt\frac{d}{dt}dtd,并代入 dJijdt=(∇Xt)ikJkj\frac{dJ_{ij}}{dt} = (\nabla X_t)_{ik} J_{kj}dtdJij=(∇Xt)ikJkj;
- 利用 JkjUivVju=σuUkvUivJ_{kj} U_{iv} V_{ju} = \sigma_u U_{kv} U_{iv}JkjUivVju=σuUkvUiv 和 JkjUiuVjv=σvUkvUiuJ_{kj} U_{iu} V_{jv} = \sigma_v U_{kv} U_{iu}JkjUiuVjv=σvUkvUiu(由 SVD 定义推导);
- 后两步将爱因斯坦求和符号转化为矩阵乘法形式,其中 UvT(∇Xt)UuU_v^T(\nabla X_t)U_uUvT(∇Xt)Uu 对应 (∇Xt)ikUivUku(\nabla X_t)_{ik} U_{iv} U_{ku}(∇Xt)ikUivUku,UuT(∇Xt)UvU_u^T(\nabla X_t)U_vUuT(∇Xt)Uv 对应 (∇Xt)ikUiuUkv(\nabla X_t)_{ik} U_{iu} U_{kv}(∇Xt)ikUiuUkv,并通过合并同类项整理得到最终结果。
Right singular vector derivatives
右奇异向量的时间导数
dVudt⋅Vv≠u=1σu2−σv2dJijdt(σvUivVju+σuUiuVjv)=1σu2−σv2(∇Xt)ikJkj(σvUivVju+σuUiuVjv)=1σu2−σv2(∇Xt)ik(σuσvUivUku+σuσvUiuUkv)=σuσvσu2−σv2(UvT(∇Xt)Uu+UuT(∇Xt)Uv)=σuσvσu2−σv2UvT(∇Xt+∇XtT)Uu\begin{align*} \frac{dV_u}{dt} \cdot V_{v\neq u} &= \frac{1}{\sigma_u^2 - \sigma_v^2} \frac{dJ_{ij}}{dt}\left(\sigma_v U_{iv} V_{ju} + \sigma_u U_{iu} V_{jv}\right) \\ &= \frac{1}{\sigma_u^2 - \sigma_v^2} (\nabla X_t)_{ik} J_{kj}\left(\sigma_v U_{iv} V_{ju} + \sigma_u U_{iu} V_{jv}\right) \\ &= \frac{1}{\sigma_u^2 - \sigma_v^2} (\nabla X_t)_{ik}\left(\sigma_u\sigma_v U_{iv} U_{ku} + \sigma_u\sigma_v U_{iu} U_{kv}\right) \\ &= \frac{\sigma_u \sigma_v}{\sigma_u^2 - \sigma_v^2}\left( U_v^T(\nabla X_t)U_u + U_u^T(\nabla X_t)U_v \right) \\ &= \frac{\sigma_u \sigma_v}{\sigma_u^2 - \sigma_v^2} U_v^T(\nabla X_t + \nabla X_t^T)U_u \end{align*}dtdVu⋅Vv=u=σu2−σv21dtdJij(σvUivVju+σuUiuVjv)=σu2−σv21(∇Xt)ikJkj(σvUivVju+σuUiuVjv)=σu2−σv21(∇Xt)ik(σuσvUivUku+σuσvUiuUkv)=σu2−σv2σuσv(UvT(∇Xt)Uu+UuT(∇Xt)Uv)=σu2−σv2σuσvUvT(∇Xt+∇XtT)Uu
注:
与左奇异向量时间导数类似,第一步替换导数符号并代入 dJdt\frac{dJ}{dt}dtdJ 的表达式;
第二步利用 JkjUivVju=σuUkvVjuJ_{kj} U_{iv} V_{ju} = \sigma_u U_{kv} V_{ju}JkjUivVju=σuUkvVju 和 JkjUiuVjv=σvUkvVjvJ_{kj} U_{iu} V_{jv} = \sigma_v U_{kv} V_{jv}JkjUiuVjv=σvUkvVjv,进一步结合 Vju=1σuJkjUkuV_{ju} = \frac{1}{\sigma_u} J_{kj} U_{ku}Vju=σu1JkjUku(由 SVD 定义 JU=ΣVTJ U = \Sigma V^TJU=ΣVT 变形),可化简得到 σvUivUku+σuUiuUkv\sigma_v U_{iv} U_{ku} + \sigma_u U_{iu} U_{kv}σvUivUku+σuUiuUkv,再乘以 σuσv\sigma_u \sigma_vσuσv 的系数;
最后一步将两项合并为 UvT(∇Xt+∇XtT)UuU_v^T(\nabla X_t + \nabla X_t^T)U_uUvT(∇Xt+∇XtT)Uu,利用了矩阵乘法的转置性质:UuT(∇Xt)Uv=(UvT(∇XtT)Uu)TU_u^T(\nabla X_t)U_v = \left( U_v^T(\nabla X_t^T)U_u \right)^TUuT(∇Xt)Uv=(UvT(∇XtT)Uu)T,而由于内积结果为标量,标量的转置等于其自身,故可合并为和的形式。
via:
- Singular value decomposition derivatives ·
https://eddiecunningham.github.io/singular-value-decomposition-derivatives.html