Resources

DeepXDE: A deep learning library for solving differential equations 2019 LINK 更倾向于PDE
Neural ODE 2018 LINK
Augmented Neural ODEs 2019 LINK Github
Dynamically Constrained Motion Planning Networks for Non-Holonomic Robots 2020 LINK
Normalizing Flows for Probabilistic Modeling and Inference 2019 LINK ZHIHU_LINK
Deep learning theory review: An optimal control and dynamical systems perspective 2019

Resource:

Neural ODE Code https://nbviewer.jupyter.org/github/urtrial/neural_ode/tree/master/
Neural ODE Code https://github.com/Rachnog/Neural-ODE-Experiments/blob/master/Neural_ODE_Basic.ipynb
https://en.wikipedia.org/wiki/Numerical_methods_for_ordinary_differential_equations
Paper List https://zhuanlan.zhihu.com/p/87999707

Neural ODE

实际上是引入了一种新的网络结构。Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. 神经网络的输出是一个black box，用于求解ODE.

实际上，微分方程与神经网络的结合已经被不少人探索过了，许多神经网络都可以理解为微分方程的离散化形式，ResNet其实就是ODE的前向欧拉法，类似的还有PolyNet(后向欧拉)、FractalNet(龙格-库塔)。More reading here？

个人感觉：这是另一种训练网络的方法？

Introduction

residual network, RNN, normalizing flows 都可以 build complicated transformations by composing a sequence of transformations to a hidden state。

如果将step t取极限小，即连续时间，此时，hidden state间的变化可以用一个ODE来描述。即：

给定一个初始时刻的值，h(0), 那么最终时刻h(T)的值就是这个ODE的解（初始状态为h(0)）。解h(T)的过程可以适用一个可以视为black box的ODE solver。

使用Neural ODE的好处：

Memory efficiency: Not storing any intermediate quantities of the forward pass allows us to train our models with constant memory cost as a function of depth, a major bottleneck of training deep models.
Adaptive computation: 现代ODE求解器可保证近似误差的变换程度、监视误差水平、并即时调整其评估策略以达到要求的精度水平。这使得评估模型的成本可以随着问题的复杂性而变化。训练后，对于实时或低功耗应用，可能会降低准确性。
Scalable and invertible normalizing flows: can use it to construct a new class of invertible density models that avoids the single-unit bottleneck of normalizing flows, and can be trained directly by maximum likelihood.
Continuous time-series models: 与RNN做对比，Neural ODE 可以获得dynamics任意时间的值。

Reverse-mode automatic differentiation() of ODE solutions

主要介绍利用一个black box的ODE solver如何求取梯度。Note: reverse-mode differentiation 就是 backpropagation)

主要是使用了adjoint sensitivity method这个方法。

假设使用了一个scaler的loss，这个loss是一个关于ODE solver输出的loss，即之前的h(T), 这里的z(t1)

这里可以用基于梯度的优化方法来优化该损失函数，这要求我们求出ODE中的关于θ的梯度，而常规神经网络使用的反向传播算法要求我们首先要求出每层隐藏状态对损失的梯度，虽然我们可以用简单的Euler法将ODE转换为类似于ResNets的形式，但这样就必须存储所有的隐层状态，另外，这样我们也无法利用那些更高级的不可微的ODE求解器。最严重的问题是，想要在给定精度下求解一个复杂的ODE，可能需要非常大的迭代步数。

所以使用Pontryagin提出的伴随灵敏度方法(Adjoint Sensitivity Method) 来解决这类动力系统优化的问题，Adjoint Method将网络参数相对于损失的梯度的计算问题转化为了求解另一个ODE的问题，这样就避免了反向传播要求保存中间隐层信息的问题。