推土机距离(Earth Mover’s Distance)
对于离散概率分布,推土机距离又称为
W
a
s
s
e
r
s
t
e
i
n
\mathrm{Wasserstein}
Wasserstein距离。如果将不同的概率分布看成不同的沙土堆,则推土机距离就是将一个沙土堆转换成另一个沙土堆所需的最小总工作量。假定有两个离散的概率分布
x
∼
P
r
x\sim P_r
x∼Pr和
y
∼
P
θ
y\sim P_\theta
y∼Pθ,其中每个概率分布都有
l
l
l种可能结果,如下图所示给出的两个概率分布特例
计算推土机距离是一个优化问题,从一个沙土堆到另一个沙土堆无数种方案进沙土传输迁移,所以需要找到一个最佳的传输方案
γ
(
x
,
y
)
\gamma(x,y)
γ(x,y)。根据上图实例,则需要如下约束条件
{
∑
x
γ
(
x
,
y
)
=
P
θ
(
y
)
∑
y
γ
(
x
,
y
)
=
P
r
(
x
)
\left\{\begin{aligned}\sum\limits_{x}\gamma(x,y)=P_\theta (y)\\\sum\limits_{y}\gamma(x,y)=P_r(x)\end{aligned}\right.
⎩⎪⎪⎪⎨⎪⎪⎪⎧x∑γ(x,y)=Pθ(y)y∑γ(x,y)=Pr(x)
γ
(
x
,
y
)
∈
Π
(
P
r
,
P
θ
)
\gamma(x,y)\in \Pi(P_r,P_\theta)
γ(x,y)∈Π(Pr,Pθ)为联合概率分布,并且
Π
(
P
r
,
P
θ
)
\Pi(P_r,P_\theta)
Π(Pr,Pθ)的边缘分布为
P
r
,
P
θ
P_r,P_\theta
Pr,Pθ。此时推土机距离为每一个
γ
(
x
,
y
)
\gamma(x,y)
γ(x,y)乘以
x
x
x到
y
y
y的欧式距离之和的最小值,具体公式为
E
M
D
(
P
r
,
P
θ
)
=
inf
γ
∈
Π
∑
x
,
y
∥
x
−
y
∥
γ
(
x
,
y
)
=
inf
γ
∈
Π
E
(
x
,
y
)
∼
γ
∥
x
−
y
∥
\mathrm{EMD}(P_r,P_\theta)=\inf\limits_{\gamma\in \Pi}\sum\limits_{x,y}\|x-y\|\gamma(x,y)=\inf\limits_{\gamma\in \Pi}\mathbb{E}_{(x,y)\sim\gamma}\|x-y\|
EMD(Pr,Pθ)=γ∈Πinfx,y∑∥x−y∥γ(x,y)=γ∈ΠinfE(x,y)∼γ∥x−y∥进一步令
Γ
=
γ
(
x
,
y
)
\Gamma=\gamma(x,y)
Γ=γ(x,y),
D
=
∥
x
,
y
∥
D=\|x,y\|
D=∥x,y∥,其中
Γ
,
D
∈
R
l
×
l
\Gamma,D\in \mathbb{R}^{l\times l}
Γ,D∈Rl×l,则上式可以化简为
E
M
D
(
P
r
,
P
θ
)
=
inf
γ
∈
Π
⟨
D
,
γ
⟩
F
\mathrm{EMD}(P_r,P_\theta)=\inf\limits_{\gamma\in \Pi}\langle D,\gamma \rangle_{F}
EMD(Pr,Pθ)=γ∈Πinf⟨D,γ⟩F其中
⟨
,
⟩
\langle , \rangle
⟨,⟩表示斐波那契内积,即对应元素相乘再相加,矩阵
Γ
\Gamma
Γ和
D
D
D的热力示意图如下所示
线性规划求解
求解推土机距离中的最优传输方案可以利用线性规划标准型来求解。给定一个向量
x
∈
R
n
x\in \mathbb{R}^n
x∈Rn,线性目标标准型的优化形式如下所示
min
x
z
=
c
⊤
x
s
.
t
.
{
A
x
=
b
x
≥
0
\begin{array}{rl}\min\limits_{x}&z=c^{\top}x\\\mathrm{s.t.}&\left\{\begin{aligned}Ax&=b\\x&\ge 0\end{aligned}\right.\end{array}
xmins.t.z=c⊤x{Axx=b≥0其中
c
∈
R
n
c\in\mathbb{R}^n
c∈Rn,
A
=
∈
R
m
×
n
A=\in \mathbb{R}^{m\times n}
A=∈Rm×n,
b
∈
R
m
b\in \mathbb{R}^m
b∈Rm。根据以上实例将推土机距离转化为线性规划标准型,首先将矩阵
Γ
\Gamma
Γ和
D
D
D进行向量化,则有
x
=
v
e
c
(
Γ
)
∈
R
n
c
=
v
e
c
(
D
)
∈
R
n
\begin{aligned}x&=\mathrm{vec}(\Gamma)\in\mathbb{R}^{n}\\c&=\mathrm{vec}(D)\in\mathbb{R}^{n}\end{aligned}
xc=vec(Γ)∈Rn=vec(D)∈Rn并且有
n
=
l
2
n=l^2
n=l2,将目标分布进行拼接则有
b
=
[
P
r
P
θ
]
b=\left[\begin{array}{c}P_r\\P_\theta\end{array}\right]
b=[PrPθ]其中
m
=
2
l
m=2l
m=2l,方程组
A
x
=
b
Ax=b
Ax=b的具体形式如下所示
{
γ
(
x
1
,
y
1
)
+
γ
(
x
1
,
y
2
)
+
γ
(
x
1
,
y
3
)
+
γ
(
x
1
,
y
4
)
+
γ
(
x
1
,
y
5
)
+
γ
(
x
1
,
y
6
)
+
γ
(
x
1
,
y
7
)
+
γ
(
x
1
,
y
8
)
+
γ
(
x
1
,
y
9
)
+
γ
(
x
1
,
y
10
)
=
P
r
(
x
1
)
γ
(
x
2
,
y
1
)
+
γ
(
x
2
,
y
2
)
+
γ
(
x
2
,
y
3
)
+
γ
(
x
2
,
y
4
)
+
γ
(
x
2
,
y
5
)
+
γ
(
x
2
,
y
6
)
+
γ
(
x
2
,
y
7
)
+
γ
(
x
2
,
y
8
)
+
γ
(
x
2
,
y
9
)
+
γ
(
x
2
,
y
10
)
=
P
r
(
x
2
)
γ
(
x
3
,
y
1
)
+
γ
(
x
3
,
y
2
)
+
γ
(
x
3
,
y
3
)
+
γ
(
x
3
,
y
4
)
+
γ
(
x
3
,
y
5
)
+
γ
(
x
3
,
y
6
)
+
γ
(
x
3
,
y
7
)
+
γ
(
x
3
,
y
8
)
+
γ
(
x
3
,
y
9
)
+
γ
(
x
3
,
y
10
)
=
P
r
(
x
3
)
γ
(
x
4
,
y
1
)
+
γ
(
x
4
,
y
2
)
+
γ
(
x
4
,
y
3
)
+
γ
(
x
4
,
y
4
)
+
γ
(
x
4
,
y
5
)
+
γ
(
x
4
,
y
6
)
+
γ
(
x
4
,
y
7
)
+
γ
(
x
4
,
y
8
)
+
γ
(
x
4
,
y
9
)
+
γ
(
x
4
,
y
10
)
=
P
r
(
x
4
)
γ
(
x
5
,
y
1
)
+
γ
(
x
5
,
y
2
)
+
γ
(
x
5
,
y
3
)
+
γ
(
x
5
,
y
4
)
+
γ
(
x
5
,
y
5
)
+
γ
(
x
5
,
y
6
)
+
γ
(
x
5
,
y
7
)
+
γ
(
x
5
,
y
8
)
+
γ
(
x
5
,
y
9
)
+
γ
(
x
5
,
y
10
)
=
P
r
(
x
5
)
γ
(
x
6
,
y
1
)
+
γ
(
x
6
,
y
2
)
+
γ
(
x
6
,
y
3
)
+
γ
(
x
6
,
y
4
)
+
γ
(
x
6
,
y
5
)
+
γ
(
x
6
,
y
6
)
+
γ
(
x
6
,
y
7
)
+
γ
(
x
6
,
y
8
)
+
γ
(
x
6
,
y
9
)
+
γ
(
x
6
,
y
10
)
=
P
r
(
x
6
)
γ
(
x
7
,
y
1
)
+
γ
(
x
7
,
y
2
)
+
γ
(
x
7
,
y
3
)
+
γ
(
x
7
,
y
4
)
+
γ
(
x
7
,
y
5
)
+
γ
(
x
7
,
y
6
)
+
γ
(
x
7
,
y
7
)
+
γ
(
x
7
,
y
8
)
+
γ
(
x
7
,
y
9
)
+
γ
(
x
7
,
y
10
)
=
P
r
(
x
7
)
γ
(
x
8
,
y
1
)
+
γ
(
x
8
,
y
2
)
+
γ
(
x
8
,
y
3
)
+
γ
(
x
8
,
y
4
)
+
γ
(
x
8
,
y
5
)
+
γ
(
x
8
,
y
6
)
+
γ
(
x
8
,
y
7
)
+
γ
(
x
8
,
y
8
)
+
γ
(
x
8
,
y
9
)
+
γ
(
x
8
,
y
10
)
=
P
r
(
x
8
)
γ
(
x
9
,
y
1
)
+
γ
(
x
9
,
y
2
)
+
γ
(
x
9
,
y
3
)
+
γ
(
x
9
,
y
4
)
+
γ
(
x
9
,
y
5
)
+
γ
(
x
9
,
y
6
)
+
γ
(
x
9
,
y
7
)
+
γ
(
x
9
,
y
8
)
+
γ
(
x
9
,
y
9
)
+
γ
(
x
9
,
y
10
)
=
P
r
(
x
9
)
γ
(
x
10
,
y
1
)
+
γ
(
x
10
,
y
2
)
+
γ
(
x
10
,
y
3
)
+
γ
(
x
10
,
y
4
)
+
γ
(
x
10
,
y
5
)
+
γ
(
x
10
,
y
6
)
+
γ
(
x
10
,
y
7
)
+
γ
(
x
10
,
y
8
)
+
⋯
+
γ
(
x
10
,
y
10
)
=
P
r
(
x
10
)
γ
(
x
1
,
y
1
)
+
γ
(
x
2
,
y
1
)
+
γ
(
x
3
,
y
1
)
+
γ
(
x
4
,
y
1
)
+
γ
(
x
5
,
y
1
)
+
γ
(
x
6
,
y
1
)
+
γ
(
x
7
,
y
1
)
+
γ
(
x
8
,
y
1
)
+
γ
(
x
9
,
y
1
)
+
γ
(
x
10
,
y
1
)
=
P
θ
(
y
1
)
γ
(
x
1
,
y
2
)
+
γ
(
x
2
,
y
2
)
+
γ
(
x
3
,
y
2
)
+
γ
(
x
4
,
y
2
)
+
γ
(
x
5
,
y
2
)
+
γ
(
x
6
,
y
2
)
+
γ
(
x
7
,
y
2
)
+
γ
(
x
8
,
y
2
)
+
γ
(
x
9
,
y
2
)
+
γ
(
x
10
,
y
2
)
=
P
θ
(
y
2
)
γ
(
x
1
,
y
3
)
+
γ
(
x
2
,
y
3
)
+
γ
(
x
3
,
y
3
)
+
γ
(
x
4
,
y
3
)
+
γ
(
x
5
,
y
3
)
+
γ
(
x
6
,
y
3
)
+
γ
(
x
7
,
y
3
)
+
γ
(
x
8
,
y
3
)
+
γ
(
x
9
,
y
3
)
+
γ
(
x
10
,
y
3
)
=
P
θ
(
y
3
)
γ
(
x
1
,
y
4
)
+
γ
(
x
2
,
y
4
)
+
γ
(
x
3
,
y
4
)
+
γ
(
x
4
,
y
4
)
+
γ
(
x
5
,
y
4
)
+
γ
(
x
6
,
y
4
)
+
γ
(
x
7
,
y
4
)
+
γ
(
x
8
,
y
4
)
+
γ
(
x
9
,
y
4
)
+
γ
(
x
10
,
y
4
)
=
P
θ
(
y
4
)
γ
(
x
1
,
y
5
)
+
γ
(
x
2
,
y
5
)
+
γ
(
x
3
,
y
5
)
+
γ
(
x
4
,
y
5
)
+
γ
(
x
5
,
y
5
)
+
γ
(
x
6
,
y
5
)
+
γ
(
x
7
,
y
5
)
+
γ
(
x
8
,
y
5
)
+
γ
(
x
9
,
y
5
)
+
γ
(
x
10
,
y
5
)
=
P
θ
(
y
5
)
γ
(
x
1
,
y
6
)
+
γ
(
x
2
,
y
6
)
+
γ
(
x
3
,
y
6
)
+
γ
(
x
4
,
y
6
)
+
γ
(
x
5
,
y
6
)
+
γ
(
x
6
,
y
6
)
+
γ
(
x
7
,
y
6
)
+
γ
(
x
8
,
y
6
)
+
γ
(
x
9
,
y
6
)
+
γ
(
x
10
,
y
6
)
=
P
θ
(
y
6
)
γ
(
x
1
,
y
7
)
+
γ
(
x
2
,
y
7
)
+
γ
(
x
3
,
y
7
)
+
γ
(
x
4
,
y
7
)
+
γ
(
x
5
,
y
7
)
+
γ
(
x
6
,
y
7
)
+
γ
(
x
7
,
y
7
)
+
γ
(
x
8
,
y
7
)
+
γ
(
x
9
,
y
7
)
+
γ
(
x
10
,
y
7
)
=
P
θ
(
y
7
)
γ
(
x
1
,
y
8
)
+
γ
(
x
2
,
y
8
)
+
γ
(
x
3
,
y
8
)
+
γ
(
x
4
,
y
8
)
+
γ
(
x
5
,
y
8
)
+
γ
(
x
6
,
y
8
)
+
γ
(
x
7
,
y
8
)
+
γ
(
x
8
,
y
8
)
+
γ
(
x
9
,
y
8
)
+
γ
(
x
10
,
y
8
)
=
P
θ
(
y
8
)
γ
(
x
1
,
y
9
)
+
γ
(
x
2
,
y
9
)
+
γ
(
x
3
,
y
9
)
+
γ
(
x
4
,
y
9
)
+
γ
(
x
5
,
y
9
)
+
γ
(
x
6
,
y
9
)
+
γ
(
x
7
,
y
9
)
+
γ
(
x
8
,
y
9
)
+
γ
(
x
9
,
y
9
)
+
γ
(
x
10
,
y
9
)
=
P
θ
(
y
9
)
γ
(
x
1
,
y
10
)
+
γ
(
x
2
,
y
10
)
+
γ
(
x
3
,
y
10
)
+
γ
(
x
4
,
y
10
)
+
γ
(
x
5
,
y
10
)
+
γ
(
x
6
,
y
10
)
+
γ
(
x
7
,
y
10
)
+
γ
(
x
8
,
y
10
)
+
⋯
+
γ
(
x
10
,
y
10
)
=
P
θ
(
y
10
)
\left\{\begin{aligned}\gamma(x_1,y_1)+\gamma(x_1,y_2)+\gamma(x_1,y_3)+\gamma(x_1,y_4)+\gamma(x_1,y_5)+\gamma(x_1,y_6)+\gamma(x_1,y_7)+\gamma(x_1,y_8)+\gamma(x_1,y_9)+\gamma(x_1,y_{10})&=P_r(x_1)\\\gamma(x_2,y_1)+\gamma(x_2,y_2)+\gamma(x_2,y_3)+\gamma(x_2,y_4)+\gamma(x_2,y_5)+\gamma(x_2,y_6)+\gamma(x_2,y_7)+\gamma(x_2,y_8)+\gamma(x_2,y_9)+\gamma(x_2,y_{10})&=P_r(x_2)\\ \gamma(x_3,y_1)+\gamma(x_3,y_2)+\gamma(x_3,y_3)+\gamma(x_3,y_4)+\gamma(x_3,y_5)+\gamma(x_3,y_6)+\gamma(x_3,y_7)+\gamma(x_3,y_8)+\gamma(x_3,y_9)+\gamma(x_3,y_{10})&=P_r(x_3)\\\gamma(x_4,y_1)+\gamma(x_4,y_2)+\gamma(x_4,y_3)+\gamma(x_4,y_4)+\gamma(x_4,y_5)+\gamma(x_4,y_6)+\gamma(x_4,y_7)+\gamma(x_4,y_8)+\gamma(x_4,y_9)+\gamma(x_4,y_{10})&=P_r(x_4)\\\gamma(x_5,y_1)+\gamma(x_5,y_2)+\gamma(x_5,y_3)+\gamma(x_5,y_4)+\gamma(x_5,y_5)+\gamma(x_5,y_6)+\gamma(x_5,y_7)+\gamma(x_5,y_8)+\gamma(x_5,y_9)+\gamma(x_5,y_{10})&=P_r(x_5)\\\gamma(x_6,y_1)+\gamma(x_6,y_2)+\gamma(x_6,y_3)+\gamma(x_6,y_4)+\gamma(x_6,y_5)+\gamma(x_6,y_6)+\gamma(x_6,y_7)+\gamma(x_6,y_8)+\gamma(x_6,y_9)+\gamma(x_6,y_{10})&=P_r(x_6)\\ \gamma(x_7,y_1)+\gamma(x_7,y_2)+\gamma(x_7,y_3)+\gamma(x_7,y_4)+\gamma(x_7,y_5)+\gamma(x_7,y_6)+\gamma(x_7,y_7)+\gamma(x_7,y_8)+\gamma(x_7,y_9)+\gamma(x_7,y_{10})&=P_r(x_7)\\\gamma(x_8,y_1)+\gamma(x_8,y_2)+\gamma(x_8,y_3)+\gamma(x_8,y_4)+\gamma(x_8,y_5)+\gamma(x_8,y_6)+\gamma(x_8,y_7)+\gamma(x_8,y_8)+\gamma(x_8,y_9)+\gamma(x_8,y_{10})&=P_r(x_8)\\\gamma(x_{9},y_1)+\gamma(x_{9},y_2)+\gamma(x_{9},y_3)+\gamma(x_{9},y_4)+\gamma(x_{9},y_5)+\gamma(x_{9},y_6)+\gamma(x_{9},y_7)+\gamma(x_{9},y_8)+\gamma(x_{9},y_9)+\gamma(x_{9},y_{10})&=P_r(x_{9})\\\gamma(x_{10},y_1)+\gamma(x_{10},y_2)+\gamma(x_{10},y_3)+\gamma(x_{10},y_4)+\gamma(x_{10},y_5)+\gamma(x_{10},y_6)+\gamma(x_{10},y_7)+\gamma(x_{10},y_8)+\cdots+\gamma(x_{10},y_{10})&=P_r(x_{10})\\\gamma(x_1,y_1)+\gamma(x_2,y_1)+\gamma(x_3,y_1)+\gamma(x_4,y_1)+\gamma(x_5,y_1)+\gamma(x_6,y_1)+\gamma(x_7,y_1)+\gamma(x_8,y_1)+\gamma(x_9,y_1)+\gamma(x_{10},y_{1})&=P_\theta(y_1)\\\gamma(x_1,y_2)+\gamma(x_2,y_2)+\gamma(x_3,y_2)+\gamma(x_4,y_2)+\gamma(x_5,y_2)+\gamma(x_6,y_2)+\gamma(x_7,y_2)+\gamma(x_8,y_2)+\gamma(x_9,y_2)+\gamma(x_{10},y_{2})&=P_\theta(y_2)\\\gamma(x_1,y_3)+\gamma(x_2,y_3)+\gamma(x_3,y_3)+\gamma(x_4,y_3)+\gamma(x_5,y_3)+\gamma(x_6,y_3)+\gamma(x_7,y_3)+\gamma(x_8,y_3)+\gamma(x_9,y_3)+\gamma(x_{10},y_{3})&=P_\theta(y_3)\\\gamma(x_1,y_4)+\gamma(x_2,y_4)+\gamma(x_3,y_4)+\gamma(x_4,y_4)+\gamma(x_5,y_4)+\gamma(x_6,y_4)+\gamma(x_7,y_4)+\gamma(x_8,y_4)+\gamma(x_9,y_4)+\gamma(x_{10},y_{4})&=P_\theta(y_4)\\\gamma(x_1,y_5)+\gamma(x_2,y_5)+\gamma(x_3,y_5)+\gamma(x_4,y_5)+\gamma(x_5,y_5)+\gamma(x_6,y_5)+\gamma(x_7,y_5)+\gamma(x_8,y_5)+\gamma(x_9,y_5)+\gamma(x_{10},y_{5})&=P_\theta(y_5)\\\gamma(x_1,y_6)+\gamma(x_2,y_6)+\gamma(x_3,y_6)+\gamma(x_4,y_6)+\gamma(x_5,y_6)+\gamma(x_6,y_6)+\gamma(x_7,y_6)+\gamma(x_8,y_6)+\gamma(x_9,y_6)+\gamma(x_{10},y_{6})&=P_\theta(y_6)\\\gamma(x_1,y_7)+\gamma(x_2,y_7)+\gamma(x_3,y_7)+\gamma(x_4,y_7)+\gamma(x_5,y_7)+\gamma(x_6,y_7)+\gamma(x_7,y_7)+\gamma(x_8,y_7)+\gamma(x_9,y_7)+\gamma(x_{10},y_{7})&=P_\theta(y_7)\\\gamma(x_1,y_8)+\gamma(x_2,y_8)+\gamma(x_3,y_8)+\gamma(x_4,y_8)+\gamma(x_5,y_8)+\gamma(x_6,y_8)+\gamma(x_7,y_8)+\gamma(x_8,y_8)+\gamma(x_9,y_8)+\gamma(x_{10},y_{8})&=P_\theta(y_8)\\\gamma(x_1,y_9)+\gamma(x_2,y_9)+\gamma(x_3,y_9)+\gamma(x_4,y_9)+\gamma(x_5,y_9)+\gamma(x_6,y_9)+\gamma(x_7,y_9)+\gamma(x_8,y_9)+\gamma(x_9,y_9)+\gamma(x_{10},y_{9})&=P_\theta(y_9)\\\gamma(x_1,y_{10})+\gamma(x_2,y_{10})+\gamma(x_3,y_{10})+\gamma(x_4,y_{10})+\gamma(x_5,y_{10})+\gamma(x_6,y_{10})+\gamma(x_7,y_{10})+\gamma(x_8,y_{10})+\cdots+\gamma(x_{10},y_{10})&=P_\theta(y_{10})\end{aligned}\right.
⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧γ(x1,y1)+γ(x1,y2)+γ(x1,y3)+γ(x1,y4)+γ(x1,y5)+γ(x1,y6)+γ(x1,y7)+γ(x1,y8)+γ(x1,y9)+γ(x1,y10)γ(x2,y1)+γ(x2,y2)+γ(x2,y3)+γ(x2,y4)+γ(x2,y5)+γ(x2,y6)+γ(x2,y7)+γ(x2,y8)+γ(x2,y9)+γ(x2,y10)γ(x3,y1)+γ(x3,y2)+γ(x3,y3)+γ(x3,y4)+γ(x3,y5)+γ(x3,y6)+γ(x3,y7)+γ(x3,y8)+γ(x3,y9)+γ(x3,y10)γ(x4,y1)+γ(x4,y2)+γ(x4,y3)+γ(x4,y4)+γ(x4,y5)+γ(x4,y6)+γ(x4,y7)+γ(x4,y8)+γ(x4,y9)+γ(x4,y10)γ(x5,y1)+γ(x5,y2)+γ(x5,y3)+γ(x5,y4)+γ(x5,y5)+γ(x5,y6)+γ(x5,y7)+γ(x5,y8)+γ(x5,y9)+γ(x5,y10)γ(x6,y1)+γ(x6,y2)+γ(x6,y3)+γ(x6,y4)+γ(x6,y5)+γ(x6,y6)+γ(x6,y7)+γ(x6,y8)+γ(x6,y9)+γ(x6,y10)γ(x7,y1)+γ(x7,y2)+γ(x7,y3)+γ(x7,y4)+γ(x7,y5)+γ(x7,y6)+γ(x7,y7)+γ(x7,y8)+γ(x7,y9)+γ(x7,y10)γ(x8,y1)+γ(x8,y2)+γ(x8,y3)+γ(x8,y4)+γ(x8,y5)+γ(x8,y6)+γ(x8,y7)+γ(x8,y8)+γ(x8,y9)+γ(x8,y10)γ(x9,y1)+γ(x9,y2)+γ(x9,y3)+γ(x9,y4)+γ(x9,y5)+γ(x9,y6)+γ(x9,y7)+γ(x9,y8)+γ(x9,y9)+γ(x9,y10)γ(x10,y1)+γ(x10,y2)+γ(x10,y3)+γ(x10,y4)+γ(x10,y5)+γ(x10,y6)+γ(x10,y7)+γ(x10,y8)+⋯+γ(x10,y10)γ(x1,y1)+γ(x2,y1)+γ(x3,y1)+γ(x4,y1)+γ(x5,y1)+γ(x6,y1)+γ(x7,y1)+γ(x8,y1)+γ(x9,y1)+γ(x10,y1)γ(x1,y2)+γ(x2,y2)+γ(x3,y2)+γ(x4,y2)+γ(x5,y2)+γ(x6,y2)+γ(x7,y2)+γ(x8,y2)+γ(x9,y2)+γ(x10,y2)γ(x1,y3)+γ(x2,y3)+γ(x3,y3)+γ(x4,y3)+γ(x5,y3)+γ(x6,y3)+γ(x7,y3)+γ(x8,y3)+γ(x9,y3)+γ(x10,y3)γ(x1,y4)+γ(x2,y4)+γ(x3,y4)+γ(x4,y4)+γ(x5,y4)+γ(x6,y4)+γ(x7,y4)+γ(x8,y4)+γ(x9,y4)+γ(x10,y4)γ(x1,y5)+γ(x2,y5)+γ(x3,y5)+γ(x4,y5)+γ(x5,y5)+γ(x6,y5)+γ(x7,y5)+γ(x8,y5)+γ(x9,y5)+γ(x10,y5)γ(x1,y6)+γ(x2,y6)+γ(x3,y6)+γ(x4,y6)+γ(x5,y6)+γ(x6,y6)+γ(x7,y6)+γ(x8,y6)+γ(x9,y6)+γ(x10,y6)γ(x1,y7)+γ(x2,y7)+γ(x3,y7)+γ(x4,y7)+γ(x5,y7)+γ(x6,y7)+γ(x7,y7)+γ(x8,y7)+γ(x9,y7)+γ(x10,y7)γ(x1,y8)+γ(x2,y8)+γ(x3,y8)+γ(x4,y8)+γ(x5,y8)+γ(x6,y8)+γ(x7,y8)+γ(x8,y8)+γ(x9,y8)+γ(x10,y8)γ(x1,y9)+γ(x2,y9)+γ(x3,y9)+γ(x4,y9)+γ(x5,y9)+γ(x6,y9)+γ(x7,y9)+γ(x8,y9)+γ(x9,y9)+γ(x10,y9)γ(x1,y10)+γ(x2,y10)+γ(x3,y10)+γ(x4,y10)+γ(x5,y10)+γ(x6,y10)+γ(x7,y10)+γ(x8,y10)+⋯+γ(x10,y10)=Pr(x1)=Pr(x2)=Pr(x3)=Pr(x4)=Pr(x5)=Pr(x6)=Pr(x7)=Pr(x8)=Pr(x9)=Pr(x10)=Pθ(y1)=Pθ(y2)=Pθ(y3)=Pθ(y4)=Pθ(y5)=Pθ(y6)=Pθ(y7)=Pθ(y8)=Pθ(y9)=Pθ(y10)其中矩阵
A
A
A是一个大的
0
0
0和
1
1
1的二值稀疏矩阵为
A
=
[
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
e
1
e
1
e
1
e
1
e
1
e
1
e
1
e
1
e
1
e
1
e
2
e
2
e
2
e
2
e
2
e
2
e
2
e
2
e
2
e
2
e
3
e
3
e
3
e
3
e
3
e
3
e
3
e
3
e
3
e
3
e
4
e
4
e
4
e
4
e
4
e
4
e
4
e
4
e
4
e
4
e
5
e
5
e
5
e
5
e
5
e
5
e
5
e
5
e
5
e
5
e
6
e
6
e
6
e
6
e
6
e
6
e
6
e
6
e
6
e
6
e
7
e
7
e
7
e
7
e
7
e
7
e
7
e
7
e
7
e
7
e
8
e
8
e
8
e
8
e
8
e
8
e
8
e
8
e
8
e
8
e
9
e
9
e
9
e
9
e
9
e
9
e
9
e
9
e
9
e
9
e
10
e
10
e
10
e
10
e
10
e
10
e
10
e
10
e
10
e
10
]
A=\left[\begin{array}{cccccccccc}\bf{1}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}\\\bf{0}&\bf{1}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}\\\bf{0}&\bf{0}&\bf{1}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}\\\bf{0}&\bf{0}&\bf{0}&\bf{1}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}\\\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{1}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}\\\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{1}&\bf{0}&\bf{0}&\bf{0}&\bf{0}\\\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{1}&\bf{0}&\bf{0}&\bf{0}\\\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{1}&\bf{0}&\bf{0}\\\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{1}&\bf{0}\\\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{0}&\bf{1}\\{\bf{e}}_1&{\bf{e}}_1&{\bf{e}}_1&{\bf{e}}_1&{\bf{e}}_1&{\bf{e}}_1&{\bf{e}}_1&{\bf{e}}_1&{\bf{e}}_1&{\bf{e}}_1\\{\bf{e}}_2&{\bf{e}}_2&{\bf{e}}_2&{\bf{e}}_2&{\bf{e}}_2&{\bf{e}}_2&{\bf{e}}_2&{\bf{e}}_2&{\bf{e}}_2&{\bf{e}}_2\\{\bf{e}}_3&{\bf{e}}_3&{\bf{e}}_3&{\bf{e}}_3&{\bf{e}}_3&{\bf{e}}_3&{\bf{e}}_3&{\bf{e}}_3&{\bf{e}}_3&{\bf{e}}_3\\{\bf{e}}_4&{\bf{e}}_4&{\bf{e}}_4&{\bf{e}}_4&{\bf{e}}_4&{\bf{e}}_4&{\bf{e}}_4&{\bf{e}}_4&{\bf{e}}_4&{\bf{e}}_4\\{\bf{e}}_5&{\bf{e}}_5&{\bf{e}}_5&{\bf{e}}_5&{\bf{e}}_5&{\bf{e}}_5&{\bf{e}}_5&{\bf{e}}_5&{\bf{e}}_5&{\bf{e}}_5\\{\bf{e}}_6&{\bf{e}}_6&{\bf{e}}_6&{\bf{e}}_6&{\bf{e}}_6&{\bf{e}}_6&{\bf{e}}_6&{\bf{e}}_6&{\bf{e}}_6&{\bf{e}}_6\\{\bf{e}}_7&{\bf{e}}_7&{\bf{e}}_7&{\bf{e}}_7&{\bf{e}}_7&{\bf{e}}_7&{\bf{e}}_7&{\bf{e}}_7&{\bf{e}}_7&{\bf{e}}_7\\{\bf{e}}_8&{\bf{e}}_8&{\bf{e}}_8&{\bf{e}}_8&{\bf{e}}_8&{\bf{e}}_8&{\bf{e}}_8&{\bf{e}}_8&{\bf{e}}_8&{\bf{e}}_8\\{\bf{e}}_9&{\bf{e}}_9&{\bf{e}}_9&{\bf{e}}_9&{\bf{e}}_9&{\bf{e}}_9&{\bf{e}}_9&{\bf{e}}_9&{\bf{e}}_9&{\bf{e}}_9\\{\bf{e}}_{10}&{\bf{e}}_{10}&{\bf{e}}_{10}&{\bf{e}}_{10}&{\bf{e}}_{10}&{\bf{e}}_{10}&{\bf{e}}_{10}&{\bf{e}}_{10}&{\bf{e}}_{10}&{\bf{e}}_{10}\end{array}\right]
A=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡1000000000e1e2e3e4e5e6e7e8e9e100100000000e1e2e3e4e5e6e7e8e9e100010000000e1e2e3e4e5e6e7e8e9e100001000000e1e2e3e4e5e6e7e8e9e100000100000e1e2e3e4e5e6e7e8e9e100000010000e1e2e3e4e5e6e7e8e9e100000001000e1e2e3e4e5e6e7e8e9e100000000100e1e2e3e4e5e6e7e8e9e100000000010e1e2e3e4e5e6e7e8e9e100000000001e1e2e3e4e5e6e7e8e9e10⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤其中向量
1
\bf{1}
1,
0
\bf{0}
0和
e
i
{\bf{e}}_i
ei具体取值如下所示
1
=
[
1
,
1
,
1
,
1
,
1
,
1
,
1
,
1
,
1
,
1
]
∈
R
1
×
10
{\bf{1}}=[1,1,1,1,1,1,1,1,1,1]\in \mathbb{R}^{1\times 10}
1=[1,1,1,1,1,1,1,1,1,1]∈R1×10
0
=
[
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
]
∈
R
1
×
10
{\bf{0}}=[0,0,0,0,0,0,0,0,0,0]\in \mathbb{R}^{1\times 10}
0=[0,0,0,0,0,0,0,0,0,0]∈R1×10
e
i
[
l
]
=
{
1
,
l
=
i
0
,
l
≠
i
,
e
i
∈
R
1
×
10
,
i
∈
{
1
,
⋯
,
10
}
{\bf{e}}_i[l]=\left\{\begin{array}{ll}1,&l=i\\0,&l\ne i\end{array},\quad {\bf{e}}_i\in \mathbb{R}^{1\times 10}, \quad i\in\{1,\cdots,10\}\right.
ei[l]={1,0,l=il=i,ei∈R1×10,i∈{1,⋯,10}通过求解线性规划问题得到的最优传输方案的示意图如下所示
对偶线性规划求解
当随机变量的离散状态数量与输入变量的维数呈指数关系时,线性规划标准型求解在这些情况下就很不实用,例如在深度学习图像领域中,用
G
A
N
\mathrm{GAN}
GAN生成图片其输入的维度可以很容易达到数千维,此时就不会用线性规划标准型进行求解矩阵
Γ
\Gamma
Γ。
G
A
N
\mathrm{GAN}
GAN的核心目的利用分布
P
r
P_r
Pr生成一个与真实分布尽可能相同的分布
P
θ
P_\theta
Pθ,所以并不需要关注联合概率分布
γ
\gamma
γ,而是关注生成分布
P
r
P_r
Pr和真实分布
P
θ
P_\theta
Pθ的
E
M
D
\mathrm{EMD}
EMD数值(推土机距离),然后将该距离看做成损失函数进而求解梯度
∇
P
θ
E
M
D
\nabla_{P_\theta}\mathrm{EMD}
∇PθEMD并对网络进行训练。任何线性规划的初始形式都有其对偶形式如下所示
p
r
i
m
a
l
f
o
r
m
:
m
i
n
i
m
i
z
e
z
=
c
⊤
x
s
.
t
.
{
A
x
=
b
x
≥
0
∣
d
u
a
l
f
o
r
m
:
m
a
x
i
m
i
z
e
z
^
=
b
⊤
y
s
.
t
.
A
⊤
.
y
≤
c
\left. {\bf{primal\text{ } form:\quad}}\begin{array}{rc}\mathrm{minimize}&z=c^{\top}x\\\mathrm{s.t.}&\left\{\begin{aligned}Ax&=b\\x &\ge 0\end{aligned}\right.\end{array}\right|{\bf{dual \text{ }form:}}\quad\begin{array}{rc}\mathrm{maximize}&\hat{z}=b^{\top}y\\\mathrm{s.t.}&A^{\top}.y \le c\\&\end{array}
primal form:minimizes.t.z=c⊤x{Axx=b≥0∣∣∣∣∣∣∣dual form:maximizes.t.z^=b⊤yA⊤.y≤c将求解初始问题的最小值转化为求解其对偶问题的最大值,目标
z
^
\hat{z}
z^直接依赖于
b
b
b,其中
b
b
b包含概率分布
P
r
P_r
Pr和
P
θ
P_\theta
Pθ。初始线性规划
z
z
z的最小值是对偶线性规划
z
^
\hat{z}
z^最大值的上界,具体则有
z
=
c
⊤
x
≥
y
⊤
A
x
=
y
⊤
b
=
z
^
z=c^{\top}x\ge y^{\top}Ax=y^{\top}b=\hat{z}
z=c⊤x≥y⊤Ax=y⊤b=z^以上不等式称为弱对偶定理。强对偶定理则是初始线性规划
z
z
z的最小值等于对偶线性规划
z
^
\hat{z}
z^最大值即
z
=
z
^
z=\hat{z}
z=z^。利用对偶形式计算推土机距离
E
M
D
\mathrm{EMD}
EMD,假定
y
∗
=
[
f
g
]
y^{*}=\left[\begin{array}{c}{\bf{f}}\\{\bf{g}}\end{array}\right]
y∗=[fg]其中
f
,
g
∈
R
l
×
1
{\bf{f,g}}\in\mathbb{R}^{l\times 1}
f,g∈Rl×1,进一步根据上公式则有
E
M
D
(
P
r
,
P
θ
)
=
f
⊤
P
r
+
g
⊤
P
θ
\mathrm{EMD}(P_r,P_\theta)={\bf{f}}^{\top}P_r+{\bf{g}}^{\top}P_\theta
EMD(Pr,Pθ)=f⊤Pr+g⊤Pθ其中向量
f
{\bf{f}}
f和
g
{\bf{g}}
g的值分别由函数
f
(
⋅
)
f(\cdot)
f(⋅)和
g
(
⋅
)
g(\cdot)
g(⋅)可得
{
f
i
=
f
(
x
i
)
g
i
=
g
(
x
i
)
,
i
∈
{
1
,
⋯
,
n
}
\left\{\begin{aligned}{\bf{f}}_i&=f(x_i)\\{\bf{g}}_i&=g(x_i)\end{aligned}\right.,\quad i\in\{1,\cdots,n\}
{figi=f(xi)=g(xi),i∈{1,⋯,n}对偶形式的约束为
A
⊤
y
≤
c
A^{\top}y\le c
A⊤y≤c,以上约束条件展开则有
f
(
x
i
)
+
g
(
x
j
)
≤
D
i
j
=
∥
x
i
−
x
j
∥
,
i
,
j
∈
{
1
,
⋯
,
n
}
f(x_i)+g(x_j)\le D_{ij}=\|x_i-x_j\|,\quad i,j\in\{1,\cdots,n\}
f(xi)+g(xj)≤Dij=∥xi−xj∥,i,j∈{1,⋯,n}此时对偶形式可以整理为如下形式
E
M
D
1
(
P
r
,
P
θ
)
=
max
f
,
g
{
∑
i
=
1
n
[
f
(
x
i
)
P
r
(
x
i
)
+
g
(
x
i
)
P
θ
(
x
i
)
]
∣
∀
i
,
j
,
f
(
x
i
)
+
g
(
x
j
)
≤
D
i
j
}
\mathrm{EMD}^{1}(P_r,P_\theta)=\max\limits_{f,g}\left\{\left.\sum\limits_{i=1}^n[f(x_i)P_r(x_i)+g(x_i)P_{\theta}(x_i)]\right|\forall i,j, \quad f(x_i)+g(x_j)\le D_{ij} \right\}
EMD1(Pr,Pθ)=f,gmax{i=1∑n[f(xi)Pr(xi)+g(xi)Pθ(xi)]∣∣∣∣∣∀i,j,f(xi)+g(xj)≤Dij}又因为当
i
=
j
i=j
i=j时,则有
f
(
x
i
)
+
g
(
x
i
)
≤
D
i
i
=
0
f(x_i)+g(x_i)\le D_{ii}=0
f(xi)+g(xi)≤Dii=0进而可知
∑
i
=
1
n
[
f
(
x
i
)
P
r
(
x
i
)
+
g
(
x
i
)
P
θ
(
x
i
)
]
≤
∑
i
=
1
n
[
f
(
x
i
)
P
r
(
x
i
)
−
f
(
x
i
)
P
θ
(
x
i
)
]
\begin{aligned}\sum\limits_{i=1}^n[f(x_i)P_r(x_i)+g(x_i)P_{\theta}(x_i)]& \le \sum\limits_{i=1}^n[f(x_i)P_r(x_i)-f(x_i)P_{\theta}(x_i)]\end{aligned}
i=1∑n[f(xi)Pr(xi)+g(xi)Pθ(xi)]≤i=1∑n[f(xi)Pr(xi)−f(xi)Pθ(xi)]所以当函数
g
=
−
f
g=-f
g=−f时,则有
E
M
D
1
(
P
r
,
P
θ
)
≤
E
M
D
2
(
P
r
,
P
θ
)
=
max
f
,
−
f
{
∑
i
=
1
n
[
f
(
x
i
)
P
r
(
x
i
)
−
f
(
x
i
)
P
θ
(
x
i
)
]
∣
∀
i
,
j
,
f
(
x
i
)
−
f
(
x
j
)
≤
D
i
j
}
\mathrm{EMD}^1(P_r,P_{\theta})\le\mathrm{EMD}^{2}(P_r,P_\theta)=\max\limits_{f,-f}\left.\left\{\sum\limits_{i=1}^n[f(x_i)P_r(x_i)-f(x_i)P_{\theta}(x_i)]\right|\forall i,j,\quad f(x_i)-f(x_j)\le D_{ij}\right\}
EMD1(Pr,Pθ)≤EMD2(Pr,Pθ)=f,−fmax{i=1∑n[f(xi)Pr(xi)−f(xi)Pθ(xi)]∣∣∣∣∣∀i,j,f(xi)−f(xj)≤Dij}又因为
E
M
D
2
(
P
r
,
P
θ
)
\mathrm{EMD}^2(P_r,P_\theta)
EMD2(Pr,Pθ)中的函数取值范围
f
,
−
f
f,-f
f,−f是
E
M
D
1
(
P
r
,
P
θ
)
\mathrm{EMD}^1(P_r,P_\theta)
EMD1(Pr,Pθ)中函数取值范围
f
,
g
f,g
f,g的一个特例,则有
E
M
D
1
(
P
r
,
P
θ
)
≥
E
M
D
2
(
P
r
,
P
θ
)
\mathrm{EMD}^1(P_r,P_\theta)\ge \mathrm{EMD}^2(P_r,P_\theta)
EMD1(Pr,Pθ)≥EMD2(Pr,Pθ),综上所述则有
E
M
D
(
P
r
,
P
θ
)
=
E
M
D
1
(
P
r
,
P
θ
)
=
E
M
D
2
(
P
r
,
P
θ
)
\mathrm{EMD}(P_r,P_\theta)=\mathrm{EMD}^1(P_r,P_\theta)= \mathrm{EMD}^2(P_r,P_\theta)
EMD(Pr,Pθ)=EMD1(Pr,Pθ)=EMD2(Pr,Pθ)又因为
∣
f
(
x
i
)
−
f
(
x
j
)
≤
∣
∣
x
i
−
x
j
∣
∣
|f(x_i)-f(x_j)\le ||x_i-x_j||
∣f(xi)−f(xj)≤∣∣xi−xj∣∣所以
f
f
f是
1
1
1-
L
i
p
s
c
h
i
z
\mathrm{Lipschiz}
Lipschiz连续,所以可以将推土机距离
E
M
D
(
P
r
,
P
θ
)
\mathrm{EMD}(P_r,P_\theta)
EMD(Pr,Pθ)对偶形式表示为
E
M
D
(
P
r
,
P
θ
)
=
max
∥
f
∥
≤
1
∑
i
=
1
n
[
f
(
x
i
)
P
r
(
x
i
)
−
f
(
x
i
)
P
θ
(
x
i
)
]
=
max
∥
f
∥
≤
1
E
x
∼
P
r
(
x
)
[
f
(
x
)
]
−
E
x
∼
P
θ
(
x
)
[
f
(
x
)
]
\begin{aligned}\mathrm{EMD}(P_r,P_\theta)&=\max\limits_{\|f\|\le 1}\sum\limits_{i=1}^n[f(x_i)P_r(x_i)-f(x_i)P_{\theta}(x_i)]\\&=\max\limits_{\|f\|\le 1}\mathbb{E}_{x\sim P_{r}(x)}[f(x)]-\mathbb{E}_{x\sim P_{\theta}(x)}[f(x)]\end{aligned}
EMD(Pr,Pθ)=∥f∥≤1maxi=1∑n[f(xi)Pr(xi)−f(xi)Pθ(xi)]=∥f∥≤1maxEx∼Pr(x)[f(x)]−Ex∼Pθ(x)[f(x)]
W
a
s
s
e
r
s
t
e
i
n
\mathrm{Wasserstein}
Wasserstein距离
当
W
a
s
s
e
r
s
t
e
i
n
\mathrm{Wasserstein}
Wasserstein距离取
1
1
1范数的时候则为推土机距离,以上考虑的概率分布是离散的情况。考虑随机变量是连续的情况,给定连续随机变量的分布
p
r
p_r
pr和
p
θ
p_\theta
pθ,并且它们是联合分布
π
(
p
r
,
p
θ
)
\pi(p_r,p_\theta)
π(pr,pθ)的边缘分布,则
W
a
s
s
e
r
s
t
e
i
n
\mathrm{Wasserstein}
Wasserstein距离表示为
W
(
p
r
,
p
θ
)
=
inf
γ
∈
π
∫
x
∫
y
∥
x
−
y
∥
γ
(
x
,
y
)
d
x
d
y
=
inf
γ
∈
π
E
x
,
y
∼
γ
[
∥
x
−
y
∥
]
W(p_r,p_\theta)=\inf_{\gamma \in \pi}\int\limits_x\int\limits_{y}\|x-y\|\gamma(x,y)dxdy=\inf\limits_{\gamma \in \pi}\mathbb{E}_{x,y\sim\gamma}[\|x-y\|]
W(pr,pθ)=γ∈πinfx∫y∫∥x−y∥γ(x,y)dxdy=γ∈πinfEx,y∼γ[∥x−y∥]通过引入一个额外的函数
f
:
x
⟼
k
∈
R
f:x\longmapsto k\in \mathbb{R}
f:x⟼k∈R,可以消除掉关于联合分布
γ
\gamma
γ所有的约束条件
W
(
p
r
,
p
θ
)
=
inf
γ
∈
π
E
x
,
y
∼
γ
[
∥
x
−
y
∥
]
=
inf
γ
∈
π
E
x
,
y
∼
γ
[
∥
x
−
y
∥
+
sup
f
E
s
∼
p
r
[
f
(
s
)
]
−
E
t
∼
p
θ
[
f
(
t
)
]
−
(
f
(
x
)
−
f
(
y
)
)
]
=
inf
γ
sup
f
E
x
,
y
∼
γ
[
∥
x
−
y
∥
+
E
s
∼
p
r
[
f
(
s
)
]
−
E
t
∼
p
θ
[
f
(
t
)
]
−
(
f
(
x
)
−
f
(
y
)
)
]
\begin{aligned}W(p_r,p_\theta)&=\inf\limits_{\gamma \in \pi}\mathbb{E}_{x,y\sim \gamma}[\|x-y\|]\\&=\inf\limits_{\gamma \in \pi}\mathbb{E}_{x,y\sim \gamma}[\|x-y\|+\sup\limits_{f} \mathbb{E}_{s \sim p_r}[f(s)]-\mathbb{E}_{t\sim p_\theta}[f(t)]-(f(x)-f(y))]\\&=\inf\limits_{\gamma}\sup\limits_{f}\mathbb{E}_{x,y\sim \gamma}[\|x-y\|+\mathbb{E}_{s\sim p_r}[f(s)]-\mathbb{E}_{t\sim p_{\theta}}[f(t)]-(f(x)-f(y))]\end{aligned}
W(pr,pθ)=γ∈πinfEx,y∼γ[∥x−y∥]=γ∈πinfEx,y∼γ[∥x−y∥+fsupEs∼pr[f(s)]−Et∼pθ[f(t)]−(f(x)−f(y))]=γinffsupEx,y∼γ[∥x−y∥+Es∼pr[f(s)]−Et∼pθ[f(t)]−(f(x)−f(y))]其中有
sup
f
E
s
∼
p
r
[
f
(
s
)
]
−
E
t
∼
p
θ
[
f
(
t
)
]
−
(
f
(
x
)
−
f
(
y
)
)
]
=
{
0
,
γ
∈
π
+
∞
,
o
t
h
e
r
w
i
s
e
\sup\limits_{f} \mathbb{E}_{s \sim p_r}[f(s)]-\mathbb{E}_{t\sim p_\theta}[f(t)]-(f(x)-f(y))]=\left\{\begin{array}{rl}0,& \gamma \in \pi \\ +\infty,& \mathrm{otherwise}\end{array}\right.
fsupEs∼pr[f(s)]−Et∼pθ[f(t)]−(f(x)−f(y))]={0,+∞,γ∈πotherwise以上问题是一个极大极小值双层优化问题,求解以上问题需要用到极小极大原理,在不改变解的前提下颠倒求解顺序,则有对偶形式如下所示
W
(
p
r
,
p
θ
)
=
sup
f
inf
γ
E
x
,
y
∼
γ
[
∥
x
−
y
∥
+
E
s
∼
p
r
[
f
(
s
)
]
−
E
t
∼
p
θ
[
f
(
t
)
]
−
(
f
(
x
)
−
f
(
y
)
)
]
=
sup
f
E
s
∼
p
r
[
f
(
s
)
]
−
E
t
∼
p
θ
[
f
(
t
)
]
+
inf
γ
E
x
,
y
∼
γ
[
∥
x
−
y
∥
−
(
f
(
x
)
−
f
(
y
)
)
]
\begin{aligned}W(p_r,p_\theta)=&\sup\limits_{f}\inf\limits_{\gamma} \mathbb{E}_{x,y\sim \gamma}[\|x-y\|+\mathbb{E}_{s\sim p_r}[f(s)]-\mathbb{E}_{t \sim p_{\theta}}[f(t)]-(f(x)-f(y))]\\=&\sup\limits_{f}\mathbb{E}_{s\sim p_r}[f(s)]-\mathbb{E}_{t\sim p_{\theta}}[f(t)]+\inf\limits_{\gamma}\mathbb{E}_{x,y \sim \gamma}[\|x-y\|-(f(x)-f(y))]\end{aligned}
W(pr,pθ)==fsupγinfEx,y∼γ[∥x−y∥+Es∼pr[f(s)]−Et∼pθ[f(t)]−(f(x)−f(y))]fsupEs∼pr[f(s)]−Et∼pθ[f(t)]+γinfEx,y∼γ[∥x−y∥−(f(x)−f(y))]又因为
inf
γ
E
x
,
y
∼
γ
[
∥
x
−
y
∥
−
(
f
(
x
)
−
f
(
y
)
)
]
=
{
0
,
∥
f
∥
L
≤
1
−
∞
,
o
t
h
e
r
w
i
s
e
\inf\limits_{\gamma}\mathbb{E}_{x,y \sim \gamma}[\|x-y\|-(f(x)-f(y))]=\left\{\begin{array}{rl}0,& \|f\|_L \le 1\\-\infty,& \mathrm{otherwise}\end{array}\right.
γinfEx,y∼γ[∥x−y∥−(f(x)−f(y))]={0,−∞,∥f∥L≤1otherwise则可得最后对偶形式如下所示
W
(
p
r
,
p
θ
)
=
sup
∥
f
∥
L
≤
1
E
s
∼
p
r
[
f
(
s
)
]
−
E
t
∼
p
θ
[
f
(
t
)
]
W(p_r,p_\theta)=\sup\limits_{\|f\|_L \le 1}\mathbb{E}_{s\sim p_r}[f(s)]-\mathbb{E}_{t\sim p_\theta}[f(t)]
W(pr,pθ)=∥f∥L≤1supEs∼pr[f(s)]−Et∼pθ[f(t)]该函数
f
f
f适合用神经网络来逼近,而且这种方法的优点是,只需夹紧权值即可实现
L
i
p
s
c
h
i
t
z
\mathrm{Lipschitz}
Lipschitz连续性。
相关代码
本文中所涉及线性规划求解,以及相关的示意图程如如下所示
import numpy as np
from matplotlib import pyplot as plt
from scipy.optimize import linprog
from matplotlib import cm
import matplotlib.colors as colors
l =10
P_r = np.array([12,7,4,1,19,14,9,6,3,2])
P_t = np.array([1,5,11,17,13,9,6,4,3,2])
P_r = P_r / np.sum(P_r)
P_t = P_t / np.sum(P_t)
plt.subplot(1,2,1)
plt.bar(range(l), P_r,1, color='limegreen', edgecolor='black',linewidth=5, alpha=1)
plt.axis('off')
plt.ylim(0,0.5)
plt.subplot(1,2,2)
plt.bar(range(l), P_t,1, color='fuchsia', edgecolor='black',linewidth=5, alpha=1)
plt.axis('off')
plt.ylim(0,0.5)
plt.tight_layout()
plt.show()
D = np.ndarray(shape=(l, l))for i inrange(l):for j inrange(l):
D[i,j]=abs(range(l)[i]-range(l)[j])
A_r = np.zeros((l, l, l))
A_t = np.zeros((l, l, l))for i inrange(l):for j inrange(l):
A_r[i, i, j]=1
A_t[i, j, i]=1# print(A_r)
A = np.concatenate((A_r.reshape((l, l**2)), A_t.reshape((l, l**2))), axis=0)print("A: \n", A.shape,"\n")
b = np.concatenate((P_r, P_t), axis=0)
c = D.reshape((l**2))
opt_res = linprog(c, A_eq = A, b_eq = b, bounds=[0,None])
emd = opt_res.fun
gamma = opt_res.x.reshape((l,l))print('EMD:', emd,"\n")
plt.imshow(gamma, cmap=cm.rainbow, interpolation='nearest')
plt.axis('off')
plt.show()
plt.imshow(D, cmap=cm.rainbow, interpolation='nearest')
plt.axis('off')
plt.show()
opt_res = linprog(-b, A.T, c, bounds=(None,None))
emd =-opt_res.fun
f = opt_res.x[0:l]
g = opt_res.x[l:]print('dual EMD:', emd)print('f: \n', f)
plt.plot(range(l), f)
plt.show()print('g: \n', g)
plt.plot(range(l), g)
plt.show()
plt.bar(range(l), np.multiply(P_r, f),1, color='blue', alpha=1)
plt.ylim(-0.5,0.5)
plt.axis('off')
plt.show()
plt.bar(range(l), np.multiply(P_t, g),1, color='green', alpha=1)
plt.axis('off')
plt.ylim(-0.5,0.5)
plt.show()#check sum
emd = np.sum(np.multiply(P_r, f))+ np.sum(np.multiply(P_t, g))print("emd: ", emd)
cNorm = colors.Normalize(vmin=0, vmax=l)
colorMap = cm.ScalarMappable(norm=cNorm, cmap=cm.terrain)
current_bottom = np.zeros(l)
r =range(l)for i in r.__reversed__():
plt.bar(r, gamma[r, i],1, color=colorMap.to_rgba(r), bottom=current_bottom)
current_bottom = current_bottom + gamma[r, i]
plt.axis('off')
plt.ylim(0,0.5)
plt.show()
current_bottom = np.zeros(l)for i in r:
plt.bar(r, gamma[i, r],1, color=colorMap.to_rgba(i), bottom=current_bottom)
current_bottom = current_bottom + gamma[i, r]
plt.axis('off')
plt.ylim(0,0.5)
plt.show()
版权归原作者 鬼道2022 所有, 如有侵权,请联系我们删除。