DQN算法的时间复杂度分析

全栈程序员-站长 • 2026年3月19日下午3:23 • 未分类 • 阅读 2

Initialize replay memory $\mathcal{D}$ to capacity $N$ (运行消耗 $t_0$ 时间)
Initialize action-value function $Q$ with random weights (运行消耗 $t_1$ 时间)
for $e p i s o d e = 1, M$ do (运行一次平均消耗 $t_{2}$ 时间，重复运行 $M$ 次)
$\uad$ Initialise sequence $s_1={x_1}$ and preprocessed sequenced $\phi_1=\phi(s_1)$ (运行一次平均消耗 $t_{2.1}$ 时间)
$\uad$ for $t = 1, T$ do (运行一次平均消耗 $t_{2.2}$ 时间，重复运行 $T$ 次)
$\uad\uad$ With probability $\epsilon$ select a random action $a_t$ (运行一次平均消耗 $t_{2.2.1}$ 时间)
$\uad\uad$ otherwise select $a_t=\max_aQ^*(\phi(s_t),a;\theta)$ (运行一次平均消耗 $t_{2.2.1}$ 时间，因为是otherwise)
$\uad\uad$ Execute action $a_t$ in emulator and observe reward $r_t$ and image $x_{t+1}$ (运行一次平均消耗 $t_{2.2.2}$ 时间)
$\uad\uad$ Set $s_{t+1}=s_t,a_t,x_{t+1}$ and preprocess $\phi_{t+1}=\phi(s_{t+1})$ (运行一次平均消耗 $t_{2.2.3}$ 时间)
$\uad\uad$ Store transition $(\phi_t,a_t,r_t,\phi_{t+1})$ in $\mathcal{D}$ (运行一次平均消耗 $t_{2.2.4}$ 时间)
$\uad\uad$ Sample random minibatch of transitions $(\phi_j,a_j,r_j,\phi_{j+1})$ (运行一次平均消耗 $t_{2.2.5}$ 时间)
$\text{Set}\quad y_j=\begin{cases}r_j, & \text{for terminal $\phi_{j+1}$} \\ r_j+\gamma\max_{a’}Q(\phi_{j+1},a’;\theta), & \text{for non-terminal $\phi_{j+1}$}\end{cases}\text{(运行一次平均消耗$t_{2.2.6}$时间)}$
$\uad\uad$ Perform a gradient descent step on $(y_j-Q(\phi_j,a_j;\theta))^2$ according to equation 3 (运行一次平均消耗 $t_{2.2.7}$ 时间)
$\uad$ end for
end for

根据代码上执行的平均时间假设，计算出来执行DQN算法的时间为：

$\begin{aligned} T(episode,t) &=t_0 + t_1 \\ & + (t_2+t_{2.1}+(t_{2.2}+t_{2.2.1}+t_{2.2.2}+t_{2.2.3}+t_{2.2.4}+t_{2.2.5}+t_{2.2.6}+t_{2.2.7})*T)*M \\ \text{(合并常数项)} & = t_{c_1}+(t_{c_2}+t_{c_3}*T)*M \\ & = t_{c_1}+(t_{c_2}*M+t_{c_3}*T*M) \\ & = t_{c_1}+t_{c_2}*M+t_{c_3}*T*M \\ & = t_{c_3}*T*M \\ & = T*M \\ \end{aligned}$

当 $e p i s o d e$ 和 $t$ 的值非常大的时候， $T (e p i s o d e, t)$ 函数中的常数项（例如步骤4中的 $t_{c_1}$ ）以及 $T$ 和 $M$ 的系数（例如步骤5中的 $t_{c_3}$ ）对 $e p i s o d e$ 和 $t$ 的影响也可以忽略不计了。同时我们要注意到 $T (e p i s o d e, t)$ 函数的主体影响因素是 $T * M$ 而不是 $M$ ，因为 $T * M$ 的增长速度远快于 $M$ 。也即，这里函数 $T (e p i s o d e, t)$ 的时间复杂度可以表示为 $T(episode,t)=O(n_tn_m)$ 其中， $n_t$ 是指每个episode中的时间步数量， $n_m$ 是指episode的数量。