当前位置: 首页 > news >正文

线性代数 · SVD | 导数

注:本文为 “线性代数 · SVD | 导数” 相关英文引文,机翻未校。


Singular value decomposition derivatives

奇异值分解的导数

Published Nov 10, 2023

The singular value decomposition (SVD) is a matrix decomposition that is used in many applications. It is defined as:
奇异值分解(Singular Value Decomposition, SVD)是一种在众多领域中均有应用的矩阵分解方法,其定义如下:

J=UΣVT\begin{align*} J &= U \Sigma V^T \end{align*}J=UΣVT

where UUU and VVV are orthogonal matrices and Σ\SigmaΣ is a diagonal matrix with non-negative entries. The diagonal entries of Σ\SigmaΣ are called the singular values of JJJ and are denoted as σ1,…,σn\sigma_1,\dots,\sigma_nσ1,,σn. The singular values are the square roots of the eigenvalues of JTJJ^TJJTJ. In this post we’re going to go over how to differentiate the elements of the SVD under the assumption that the singular values are all distinct and non-zero.
其中,UUUVVV 为正交矩阵,Σ\SigmaΣ 为对角元非负的对角矩阵。Σ\SigmaΣ 的对角元被称为矩阵 JJJ 的奇异值,记为 σ1,…,σn\sigma_1,\dots,\sigma_nσ1,,σn。奇异值是矩阵 JTJJ^TJJTJ 特征值的平方根。本文将在“所有奇异值互不相同且非零”的假设下,详细推导奇异值分解各元素的导数。

Einstein notation

爱因斯坦求和符号

Before proceeding, we need to understand Einstein notation. Einstein notation is an alternative way of writing matrix equations where we undo the matrix notation and remove the summation symbol. For example, lets write the equation Ax=bAx = bAx=b in Einstein notation.
在进行后续推导前,需先理解爱因斯坦求和符号(Einstein notation)。该符号是矩阵方程的一种替代表示方法,核心是“拆解矩阵形式并省略求和符号”。以方程 Ax=bAx = bAx=b 为例,其爱因斯坦求和符号表示过程如下:

  • Step 1: Undo the matrix notation 拆解矩阵形式
    Ax=b→∑jJijxj=biAx = b \to \sum_j J_{ij}x_j = b_iAx=bjJijxj=bi

  • Step 2: Remove the summation symbol 省略求和符号
    ∑jJijxj=bi→Jijxj=bi\sum_j J_{ij}x_j = b_i \to J_{ij}x_j = b_ijJijxj=biJijxj=bi

And thats it! All we did was remove the summation symbol. When we see Einstein notation in practice, we implicitly assume that there is a summation over indices that only appear on one side of the equality. Also in Einstein notation, we will make use of the Kronecker delta function δij\delta_{ij}δij which is 111 when i=ji=ji=j and 000 otherwise.
推导至此即可!整个过程的核心就是省略求和符号。在实际使用爱因斯坦求和符号时,我们默认:对“仅在等式一侧出现的指标”进行求和(即“哑标求和”规则)。此外,爱因斯坦求和符号中还会用到克罗内克δ函数(Kronecker delta function)δij\delta_{ij}δij,其定义为:当 i=ji=ji=j 时,δij=1\delta_{ij}=1δij=1;当 i≠ji \neq ji=j 时,δij=0\delta_{ij}=0δij=0

Orthogonal matrices

正交矩阵

Next, we need to know how to differentiate orthogonal matrices. Let QQQ be an orthogonal matrix, then by definition QkiQkj=δijQ_{ki}Q_{kj} = \delta_{ij}QkiQkj=δij. Taking the derivative yields:
接下来,需推导正交矩阵的导数。设 QQQ 为正交矩阵,根据正交矩阵的定义,有 QkiQkj=δijQ_{ki}Q_{kj} = \delta_{ij}QkiQkj=δij。对等式两侧求导可得:

∂(QkiQkj)=∂δij⟹∂QkiQkj+Qki∂Qkj=0⟹∂QkiQkj=−Qki∂Qkj\begin{align*} \partial (Q_{ki}Q_{kj}) &= \partial \delta_{ij} \\ \implies \partial Q_{ki} Q_{kj} + Q_{ki} \partial Q_{kj} &= 0 \\ \implies \partial Q_{ki} Q_{kj} &= -Q_{ki} \partial Q_{kj} \\ \end{align*}(QkiQkj)QkiQkj+QkiQkjQkiQkj=δij=0=QkiQkj

To make this equations clearer, we can undo some of the Einstein notation by letting qi:=Q:,iq_i:= Q_{:,i}qi:=Q:,i be the iiith column of QQQ. Then we have:
为使等式更清晰,可部分还原爱因斯坦求和符号的矩阵形式:令 qi:=Q:,iq_i := Q_{:,i}qi:=Q:,i(即 qiq_iqi 为矩阵 QQQ 的第 iii 列),此时等式可改写为:

∂qi⋅qj=−∂qj⋅qi\begin{align*} \partial q_i \cdot q_j = -\partial q_j \cdot q_i \end{align*}qiqj=qjqi

Note that when i=ji=ji=j, ∂qi⋅qi=0\partial q_i \cdot q_i = 0qiqi=0. This will be useful when we differentiate the SVD.
需注意,当 i=ji=ji=j 时,有 ∂qi⋅qi=0\partial q_i \cdot q_i = 0qiqi=0。该结论在后续奇异值分解的求导过程中会发挥重要作用。

Singular value decomposition derivatives

奇异值分解的导数

Lets start by writing the SVD using Einstein notation
首先,用爱因斯坦求和符号表示奇异值分解:

Jij=UiuσuVju\begin{align*} J_{ij} &= U_{iu} \sigma_u V_{ju} \\ \end{align*}Jij=UiuσuVju

with some term rearrangement, we can write two equations:
通过调整项的顺序,可得到以下两个等式:

JijUiu=σuVjuJijVju=σuUiu\begin{align*} J_{ij}U_{iu} &= \sigma_u V_{ju} \\ J_{ij}V_{ju} &= \sigma_u U_{iu} \end{align*}JijUiuJijVju=σuVju=σuUiu

by applying a derivative, we get
对上述两个等式两侧分别求导,可得:

∂JijUiu+Jij∂Uiu=∂σuVju+σu∂Vju∂JijVju+Jij∂Vju=∂σuUiu+σu∂Uiu\begin{align*} \partial J_{ij}U_{iu} + J_{ij}\partial U_{iu} &= \partial \sigma_u V_{ju} + \sigma_u \partial V_{ju} \\ \partial J_{ij}V_{ju} + J_{ij}\partial V_{ju} &= \partial \sigma_u U_{iu} + \sigma_u \partial U_{iu} \end{align*}JijUiu+JijUiuJijVju+JijVju=σuVju+σuVju=σuUiu+σuUiu

We’ll call the first equation, equation 1 and the second equation, equation 2.
我们将第一个等式称为“等式 1”,第二个等式称为“等式 2”。

Singular value derivaties

奇异值的导数

To get the derivatives of the singular values, we can multiply both sides of equation 2 by UiuU_{iu}Uiu and summing over iii:
为推导奇异值的导数,可将等式 2 两侧同时乘以 UiuU_{iu}Uiu,并对指标 iii 求和:

∂JijVjuUiu+Jij∂VjuUiu⏟σuVju∂Vju=0=∂σuUiuUiu⏟1+σu∂UiuUiu⏟0⟹∂σu=∂JijUiuVju\begin{align*} \partial J_{ij}V_{ju} U_{iu} + \underbrace{J_{ij}\partial V_{ju} U_{iu}}_{\sigma_u V_{ju} \partial V_{ju}=0} &= \partial \sigma_u \underbrace{U_{iu} U_{iu}}_{1} + \sigma_u \underbrace{\partial U_{iu} U_{iu}}_{0} \\ \implies \partial \sigma_u &= \partial J_{ij}U_{iu} V_{ju} \end{align*}JijVjuUiu+σuVjuVju=0JijVjuUiuσu=σu1UiuUiu+σu0UiuUiu=JijUiuVju

注:推导中用到两个结论:

  1. JijVju=σuUiuJ_{ij}V_{ju} = \sigma_u U_{iu}JijVju=σuUiuJij∂VjuUiu=σuVju∂VjuJ_{ij}\partial V_{ju} U_{iu} = \sigma_u V_{ju} \partial V_{ju}JijVjuUiu=σuVjuVju,再结合正交矩阵求导结论 ∂VjuVju=0\partial V_{ju} V_{ju}=0VjuVju=0,故该项为 0;
  2. 正交矩阵列向量单位性:UiuUiu=1U_{iu} U_{iu}=1UiuUiu=1,且 ∂UiuUiu=0\partial U_{iu} U_{iu}=0UiuUiu=0

Singular vector derivatives

奇异向量的导数

Next, to isolate the derivatives of the singular vecotrs, we’ll first multiply both sides of equation 1 by VjvV_{jv}Vjv, where v≠uv \neq uv=u and sum over jjj:
接下来推导奇异向量的导数。首先,将等式 1 两侧同时乘以 VjvV_{jv}Vjv(其中 v≠uv \neq uv=u),并对指标 jjj 求和:

∂JijUiuVjv+Jij∂UiuVjv⏟σv∂UiuUiv=∂σuVjuVjv⏟0+σu∂VjuVjv∂JijUiuVjv=−σv∂UiuUiv+σu∂VjuVjv\begin{align*} \partial J_{ij}U_{iu} V_{jv} + \underbrace{J_{ij}\partial U_{iu} V_{jv}}_{\sigma_v \partial U_{iu} U_{iv}} &= \partial \sigma_u \underbrace{V_{ju} V_{jv}}_{0} + \sigma_u \partial V_{ju} V_{jv} \\ \partial J_{ij}U_{iu} V_{jv} &= -\sigma_v \partial U_{iu} U_{iv} + \sigma_u \partial V_{ju} V_{jv} \end{align*}JijUiuVjv+σvUiuUivJijUiuVjvJijUiuVjv=σu0VjuVjv+σuVjuVjv=σvUiuUiv+σuVjuVjv

注:

  1. JijVjv=σvUivJ_{ij}V_{jv} = \sigma_v U_{iv}JijVjv=σvUivJij∂UiuVjv=σv∂UiuUivJ_{ij}\partial U_{iu} V_{jv} = \sigma_v \partial U_{iu} U_{iv}JijUiuVjv=σvUiuUiv
  2. 正交矩阵列向量正交性:VjuVjv=0V_{ju} V_{jv}=0VjuVjv=0(因 v≠uv \neq uv=u))

Similarly we can do the same with equation 2 but multiply by UivU_{iv}Uiv where v≠uv\neq uv=u and sum over iii:
类似地,对等式 2 进行操作:将其两侧同时乘以 UivU_{iv}Uiv(其中 v≠uv \neq uv=u),并对指标 iii 求和:

∂JijVjuUiv+Jij∂VjuUiv⏟σv∂VjuVjv=∂σuUiuUiv⏟0+σu∂UiuUiv∂JijUivVju=σu∂UiuUiv−σv∂VjuVjv\begin{align*} \partial J_{ij}V_{ju} U_{iv} + \underbrace{J_{ij}\partial V_{ju} U_{iv}}_{\sigma_v \partial V_{ju} V_{jv}} &= \partial \sigma_u \underbrace{U_{iu} U_{iv}}_{0} + \sigma_u \partial U_{iu} U_{iv} \\ \partial J_{ij}U_{iv} V_{ju} &= \sigma_u \partial U_{iu} U_{iv} - \sigma_v \partial V_{ju} V_{jv} \end{align*}JijVjuUiv+σvVjuVjvJijVjuUivJijUivVju=σu0UiuUiv+σuUiuUiv=σuUiuUivσvVjuVjv

注:

  1. JijUiv=σvVjvJ_{ij}U_{iv} = \sigma_v V_{jv}JijUiv=σvVjvJij∂VjuUiv=σv∂VjuVjvJ_{ij}\partial V_{ju} U_{iv} = \sigma_v \partial V_{ju} V_{jv}JijVjuUiv=σvVjuVjv
  2. 正交矩阵列向量正交性:UiuUiv=0U_{iu} U_{iv}=0UiuUiv=0(因 v≠uv \neq uv=u))

So we’re left with the equation
综上,可得到如下方程组:

∂JijUiuVjv=−σv∂UiuUiv+σu∂VjuVjv∂JijUivVju=σu∂UiuUiv−σv∂VjuVjv\begin{align*} \partial J_{ij}U_{iu} V_{jv} &= -\sigma_v \partial U_{iu} U_{iv} + \sigma_u \partial V_{ju} V_{jv} \\ \partial J_{ij}U_{iv} V_{ju} &= \sigma_u \partial U_{iu} U_{iv} - \sigma_v \partial V_{ju} V_{jv} \end{align*}JijUiuVjvJijUivVju=σvUiuUiv+σuVjuVjv=σuUiuUivσvVjuVjv

Left singular vectors
左奇异向量

Lets multiply the above equations by σv\sigma_vσv and σu\sigma_uσu respectively:
将上述方程组的第一个等式乘以 σv\sigma_vσv,第二个等式乘以 σu\sigma_uσu,可得:

σv∂JijUiuVjv=−σv2∂UiuUiv+σvσu∂VjuVjvσu∂JijUivVju=σu2∂UiuUiv−σvσu∂VjuVjv\begin{align*} \sigma_v \partial J_{ij}U_{iu} V_{jv} &= -{\sigma_v}^2 \partial U_{iu} U_{iv} + \sigma_v \sigma_u \partial V_{ju} V_{jv} \\ \sigma_u \partial J_{ij}U_{iv} V_{ju} &= {\sigma_u}^2 \partial U_{iu} U_{iv} - \sigma_v \sigma_u \partial V_{ju} V_{jv} \end{align*}σvJijUiuVjvσuJijUivVju=σv2UiuUiv+σvσuVjuVjv=σu2UiuUivσvσuVjuVjv

If we sum the equations, the last terms cancel and we’re left with
将两个等式相加,右侧的交叉项(σvσu∂VjuVjv\sigma_v \sigma_u \partial V_{ju} V_{jv}σvσuVjuVjv−σvσu∂VjuVjv-\sigma_v \sigma_u \partial V_{ju} V_{jv}σvσuVjuVjv)相互抵消,最终得到:

∂Jij(σuUivVju+σvUiuVjv)=(σu2−σv2)∂UiuUiv⟹∂UiuUiv=1σu2−σv2∂Jij(σuUivVju+σvUiuVjv)\begin{align*} \partial J_{ij}\left(\sigma_u U_{iv} V_{ju} + \sigma_v U_{iu} V_{jv}\right) &= ({\sigma_u}^2 - {\sigma_v}^2) \partial U_{iu} U_{iv} \\ \implies \partial U_{iu} U_{iv} &= \frac{1}{\sigma_u^2 - \sigma_v^2} \partial J_{ij}\left(\sigma_u U_{iv} V_{ju} + \sigma_v U_{iu} V_{jv}\right) \end{align*}Jij(σuUivVju+σvUiuVjv)UiuUiv=(σu2σv2)UiuUiv=σu2σv21Jij(σuUivVju+σvUiuVjv)

Right singular vectors
右奇异向量

Similarly, if we multiplied by σu\sigma_uσu and σv\sigma_vσv respectively, we get
类似地,将原方程组的第一个等式乘以 σu\sigma_uσu,第二个等式乘以 σv\sigma_vσv,可得:

σu∂JijUiuVjv=−σvσu∂UiuUiv+σu2∂VjuVjvσv∂JijUivVju=σvσu∂UiuUiv−σv2∂VjuVjv\begin{align*} \sigma_u \partial J_{ij}U_{iu} V_{jv} &= -{\sigma_v} \sigma_u \partial U_{iu} U_{iv} + {\sigma_u}^2 \partial V_{ju} V_{jv} \\ \sigma_v\partial J_{ij}U_{iv} V_{ju} &= \sigma_v\sigma_u \partial U_{iu} U_{iv} - {\sigma_v}^2 \partial V_{ju} V_{jv} \end{align*}σuJijUiuVjvσvJijUivVju=σvσuUiuUiv+σu2VjuVjv=σvσuUiuUivσv2VjuVjv

If we sum the equations, the first terms on the RHS cancel and we’re left with
将两个等式相加,右侧的交叉项(−σvσu∂UiuUiv-\sigma_v \sigma_u \partial U_{iu} U_{iv}σvσuUiuUivσvσu∂UiuUiv\sigma_v \sigma_u \partial U_{iu} U_{iv}σvσuUiuUiv)相互抵消,最终得到:

∂Jij(σvUivVju+σuUiuVjv)=(σu2−σv2)∂VjuVjv⟹∂VjuVjv=1σu2−σv2∂Jij(σvUivVju+σuUiuVjv)\begin{align*} \partial J_{ij}\left(\sigma_v U_{iv} V_{ju} + \sigma_u U_{iu} V_{jv}\right) &= ({\sigma_u}^2 - {\sigma_v}^2) \partial V_{ju} V_{jv} \\ \implies \partial V_{ju} V_{jv} &= \frac{1}{\sigma_u^2 - \sigma_v^2} \partial J_{ij}\left(\sigma_v U_{iv} V_{ju} + \sigma_u U_{iu} V_{jv}\right) \end{align*}Jij(σvUivVju+σuUiuVjv)VjuVjv=(σu2σv2)VjuVjv=σu2σv21Jij(σvUivVju+σuUiuVjv)

Summary

总结

To simplify the expressions, we’ll use the notation Ui:=U:,iU_i := U_{:,i}Ui:=U:,i and Vi:=V:,iV_i := V_{:,i}Vi:=V:,i to denote the iiith column of UUU and VVV respectively. Then returning to matrix notation yields:
为简化表达式,定义符号 Ui:=U:,iU_i := U_{:,i}Ui:=U:,iUiU_iUi 为矩阵 UUU 的第 iii 列)和 Vi:=V:,iV_i := V_{:,i}Vi:=V:,iViV_iVi 为矩阵 VVV 的第 iii 列),并还原为矩阵形式,可得:

∂σu=∂JijUiuVju∂Uu⋅Uv≠u=1σu2−σv2∂Jij(σuUivVju+σvUiuVjv)∂Vu⋅Vv=1σu2−σv2∂Jij(σvUivVju+σuUiuVjv)\begin{align*} \partial \sigma_u &= \partial J_{ij}U_{iu} V_{ju} \\ \partial U_u \cdot U_{v\neq u} &= \frac{1}{\sigma_u^2 - \sigma_v^2} \partial J_{ij}\left(\sigma_u U_{iv} V_{ju} + \sigma_v U_{iu} V_{jv}\right) \\ \partial V_u \cdot V_v &= \frac{1}{\sigma_u^2 - \sigma_v^2} \partial J_{ij}\left(\sigma_v U_{iv} V_{ju} + \sigma_u U_{iu} V_{jv}\right) \end{align*}σuUuUv=uVuVv=JijUiuVju=σu2σv21Jij(σuUivVju+σvUiuVjv)=σu2σv21Jij(σvUivVju+σuUiuVjv)

Note that to isolate the derivatives of UUU and VVV, we can write them as a linear combination of the singular vectors:
需注意,若要单独表示 UUUVVV 的导数,可将其表示为奇异向量的线性组合:

∂Uu=(∂Uu⋅Uv≠u)Uu∂Vu=(∂Vu⋅Vv≠u)Vu\begin{align*} \partial U_u = (\partial U_u \cdot U_{v\neq u}) U_u \\ \partial V_u = (\partial V_u \cdot V_{v\neq u}) V_u \end{align*}Uu=(UuUv=u)UuVu=(VuVv=u)Vu

Because UUU and VVV are orthogonal, ∂Uu⋅Uu=∂Vu⋅Vu=0\partial U_u \cdot U_u = \partial V_u \cdot V_u = 0UuUu=VuVu=0.
这是因为 UUUVVV 均为正交矩阵,根据正交矩阵求导结论,有 ∂Uu⋅Uu=∂Vu⋅Vu=0\partial U_u \cdot U_u = \partial V_u \cdot V_u = 0UuUu=VuVu=0(即奇异向量的导数与其自身正交)。

Time derivative

时间导数

We can also see how the singular vectors and singular values evolve when we flow on the vector field:
当矩阵 JJJ 随向量场 dxtdt=Xt(xt)\frac{dx_t}{dt} = X_t(x_t)dtdxt=Xt(xt) 演化时,我们还可推导奇异向量和奇异值的时间导数:

dxtdt=Xt(xt)\begin{align*} \frac{dx_t}{dt} = X_t(x_t) \end{align*}dtdxt=Xt(xt)

To do this, recall that we can write the time derivative of the components of JJJ as:
首先,回顾矩阵 JJJ 各元素的时间导数表达式:

dJdt=∇XtJ\begin{align*} \frac{dJ}{dt} = \nabla X_t J \end{align*}dtdJ=XtJ

Then we can look at the time derivative of the SVD derivatives.
基于此,可进一步推导奇异值分解各元素的时间导数。

Singular value derivatives

奇异值的时间导数

dσudt=dJijdtUiuVju=(∇Xt)ikJkjUiuVju=(∇Xt)ikUiuσuUku\begin{align*} \frac{d\sigma_u}{dt} &= \frac{dJ_{ij}}{dt}U_{iu} V_{ju} \\ &= (\nabla X_t)_{ik} J_{kj} U_{iu} V_{ju} \\ &= (\nabla X_t)_{ik} U_{iu} \sigma_u U_{ku} \\ \end{align*}dtdσu=dtdJijUiuVju=(Xt)ikJkjUiuVju=(Xt)ikUiuσuUku

This is more simply expressed using the log of the singular values:
若对奇异值取对数,表达式可进一步简化:

dlog⁡σudt=(∇Xt)ikUiuUku\begin{align*} \frac{d\log \sigma_u}{dt} = (\nabla X_t)_{ik} U_{iu} U_{ku} \end{align*}dtdlogσu=(Xt)ikUiuUku

注:对 dσudt=(∇Xt)ikUiuσuUku\frac{d\sigma_u}{dt} = (\nabla X_t)_{ik} U_{iu} \sigma_u U_{ku}dtdσu=(Xt)ikUiuσuUku 两侧同时除以 σu\sigma_uσu,利用 1σudσudt=dlog⁡σudt\frac{1}{\sigma_u}\frac{d\sigma_u}{dt} = \frac{d\log \sigma_u}{dt}σu1dtdσu=dtdlogσu,即可得到上述对数形式的时间导数,该形式在分析奇异值的相对变化率时更便捷。)

Left singular vector derivatives

左奇异向量的时间导数

dUudt⋅Uv≠u=1σu2−σv2dJijdt(σuUivVju+σvUiuVjv)=1σu2−σv2(∇Xt)ikJkj(σuUivVju+σvUiuVjv)=1σu2−σv2(∇Xt)ik(σu2UivUku+σv2UiuUkv)=σu2σu2−σv2UvT(∇Xt)Uu+σv2σu2−σv2UuT(∇Xt)Uv=UvT(σu2σu2−σv2∇Xt+σv2σu2−σv2∇XtT)Uu\begin{align*} \frac{dU_u}{dt} \cdot U_{v\neq u} &= \frac{1}{\sigma_u^2 - \sigma_v^2} \frac{dJ_{ij}}{dt}\left(\sigma_u U_{iv} V_{ju} + \sigma_v U_{iu} V_{jv}\right) \\ &= \frac{1}{\sigma_u^2 - \sigma_v^2} (\nabla X_t)_{ik} J_{kj}\left(\sigma_u U_{iv} V_{ju} + \sigma_v U_{iu} V_{jv}\right) \\ &= \frac{1}{\sigma_u^2 - \sigma_v^2} (\nabla X_t)_{ik}\left(\sigma_u^2 U_{iv} U_{ku} + \sigma_v^2 U_{iu} U_{kv}\right) \\ &= \frac{\sigma_u^2}{\sigma_u^2 - \sigma_v^2} U_v^T(\nabla X_t)U_u + \frac{\sigma_v^2}{\sigma_u^2 - \sigma_v^2} U_u^T(\nabla X_t)U_v \\ &= U_v^T\left(\frac{\sigma_u^2}{\sigma_u^2 - \sigma_v^2}\nabla X_t + \frac{\sigma_v^2}{\sigma_u^2 - \sigma_v^2} \nabla X_t^T\right)U_u \end{align*}dtdUuUv=u=σu2σv21dtdJij(σuUivVju+σvUiuVjv)=σu2σv21(Xt)ikJkj(σuUivVju+σvUiuVjv)=σu2σv21(Xt)ik(σu2UivUku+σv2UiuUkv)=σu2σv2σu2UvT(Xt)Uu+σu2σv2σv2UuT(Xt)Uv=UvT(σu2σv2σu2Xt+σu2σv2σv2XtT)Uu

注:

  1. 将“奇异向量导数公式”中的 ∂\partial 替换为时间导数 ddt\frac{d}{dt}dtd,并代入 dJijdt=(∇Xt)ikJkj\frac{dJ_{ij}}{dt} = (\nabla X_t)_{ik} J_{kj}dtdJij=(Xt)ikJkj
  2. 利用 JkjUivVju=σuUkvUivJ_{kj} U_{iv} V_{ju} = \sigma_u U_{kv} U_{iv}JkjUivVju=σuUkvUivJkjUiuVjv=σvUkvUiuJ_{kj} U_{iu} V_{jv} = \sigma_v U_{kv} U_{iu}JkjUiuVjv=σvUkvUiu(由 SVD 定义推导);
  3. 后两步将爱因斯坦求和符号转化为矩阵乘法形式,其中 UvT(∇Xt)UuU_v^T(\nabla X_t)U_uUvT(Xt)Uu 对应 (∇Xt)ikUivUku(\nabla X_t)_{ik} U_{iv} U_{ku}(Xt)ikUivUkuUuT(∇Xt)UvU_u^T(\nabla X_t)U_vUuT(Xt)Uv 对应 (∇Xt)ikUiuUkv(\nabla X_t)_{ik} U_{iu} U_{kv}(Xt)ikUiuUkv,并通过合并同类项整理得到最终结果。

Right singular vector derivatives

右奇异向量的时间导数

dVudt⋅Vv≠u=1σu2−σv2dJijdt(σvUivVju+σuUiuVjv)=1σu2−σv2(∇Xt)ikJkj(σvUivVju+σuUiuVjv)=1σu2−σv2(∇Xt)ik(σuσvUivUku+σuσvUiuUkv)=σuσvσu2−σv2(UvT(∇Xt)Uu+UuT(∇Xt)Uv)=σuσvσu2−σv2UvT(∇Xt+∇XtT)Uu\begin{align*} \frac{dV_u}{dt} \cdot V_{v\neq u} &= \frac{1}{\sigma_u^2 - \sigma_v^2} \frac{dJ_{ij}}{dt}\left(\sigma_v U_{iv} V_{ju} + \sigma_u U_{iu} V_{jv}\right) \\ &= \frac{1}{\sigma_u^2 - \sigma_v^2} (\nabla X_t)_{ik} J_{kj}\left(\sigma_v U_{iv} V_{ju} + \sigma_u U_{iu} V_{jv}\right) \\ &= \frac{1}{\sigma_u^2 - \sigma_v^2} (\nabla X_t)_{ik}\left(\sigma_u\sigma_v U_{iv} U_{ku} + \sigma_u\sigma_v U_{iu} U_{kv}\right) \\ &= \frac{\sigma_u \sigma_v}{\sigma_u^2 - \sigma_v^2}\left( U_v^T(\nabla X_t)U_u + U_u^T(\nabla X_t)U_v \right) \\ &= \frac{\sigma_u \sigma_v}{\sigma_u^2 - \sigma_v^2} U_v^T(\nabla X_t + \nabla X_t^T)U_u \end{align*}dtdVuVv=u=σu2σv21dtdJij(σvUivVju+σuUiuVjv)=σu2σv21(Xt)ikJkj(σvUivVju+σuUiuVjv)=σu2σv21(Xt)ik(σuσvUivUku+σuσvUiuUkv)=σu2σv2σuσv(UvT(Xt)Uu+UuT(Xt)Uv)=σu2σv2σuσvUvT(Xt+XtT)Uu

注:

  1. 与左奇异向量时间导数类似,第一步替换导数符号并代入 dJdt\frac{dJ}{dt}dtdJ 的表达式;

  2. 第二步利用 JkjUivVju=σuUkvVjuJ_{kj} U_{iv} V_{ju} = \sigma_u U_{kv} V_{ju}JkjUivVju=σuUkvVjuJkjUiuVjv=σvUkvVjvJ_{kj} U_{iu} V_{jv} = \sigma_v U_{kv} V_{jv}JkjUiuVjv=σvUkvVjv,进一步结合 Vju=1σuJkjUkuV_{ju} = \frac{1}{\sigma_u} J_{kj} U_{ku}Vju=σu1JkjUku(由 SVD 定义 JU=ΣVTJ U = \Sigma V^TJU=ΣVT 变形),可化简得到 σvUivUku+σuUiuUkv\sigma_v U_{iv} U_{ku} + \sigma_u U_{iu} U_{kv}σvUivUku+σuUiuUkv,再乘以 σuσv\sigma_u \sigma_vσuσv 的系数;

  3. 最后一步将两项合并为 UvT(∇Xt+∇XtT)UuU_v^T(\nabla X_t + \nabla X_t^T)U_uUvT(Xt+XtT)Uu,利用了矩阵乘法的转置性质:UuT(∇Xt)Uv=(UvT(∇XtT)Uu)TU_u^T(\nabla X_t)U_v = \left( U_v^T(\nabla X_t^T)U_u \right)^TUuT(Xt)Uv=(UvT(XtT)Uu)T,而由于内积结果为标量,标量的转置等于其自身,故可合并为和的形式。


via:

  • Singular value decomposition derivatives ·
    https://eddiecunningham.github.io/singular-value-decomposition-derivatives.html
http://www.dtcms.com/a/438153.html

相关文章:

  • 免费网页设计成品网站工程建设云
  • wordpress火车头自动分类绍兴百度推广优化排名
  • hadoop-mapreduce编程模型
  • 黄页网站推广公司百度答主招募入口官网
  • AutoOps:简化自管理 Elasticsearch 的旅程
  • python如何批量下载图片
  • PDF中表格的处理 (OCR)
  • 怎样查网站空间地址代理公司注册的价格
  • LangChain源码分析(一)- LLM大语言模型
  • Android setContentView源码与原理分析
  • dlink nas建设网站有什么免费推广项目的好软件
  • 开源 C++ QT QML 开发(一)基本介绍
  • Java学习笔记Day14
  • C++进阶(4)——C++11右值引用和移动语义
  • 从入门到精通【Redis】理解Redis主从复制
  • 公司网站不备案wordpress地址怎么打开
  • 柯西显威:一道最值题的降维打击
  • Java 集合 “Map(2)”面试清单(含超通俗生活案例与深度理解)
  • 网站怎么做悬浮图片放大带后台的网站模板下载
  • java学习:四大排序
  • npm install 中的 --save 和 --save-dev 使用说明
  • 个人网站欣赏h5网站和传统网站区别
  • Inception V3--J9
  • Spring——编程式事务
  • 如何比较两个目录档案的差异
  • 美发店收银系统教程
  • wordpress网站怎么打开对于高校类建设网站的要求
  • 理解神经网络流程
  • 2025年渗透测试面试题总结-99(题目+回答)
  • Linux启动流程与字符设备驱动详解 - 从bootloader到驱动开发