Commit 584261a9 authored by Szymon Sidor's avatar Szymon Sidor Committed by GitHub
Browse files

Merge pull request #14 from quanvuong/master

Consistent initial type (float) for episode_rewards
parents 9c10c2fc 86054f7a
......@@ -222,7 +222,7 @@ def learn(env,
episode_rewards[-1] += rew
if done:
obs = env.reset()
episode_rewards.append(0)
episode_rewards.append(0.0)
if t > learning_starts and t % train_freq == 0:
# Minimize the error in Bellman's equation on a batch sampled from replay buffer.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment