Speaker: Tamas Endrei. Abstract: Although deep reinforcement learning (DRL) deals with sequential decision-making problems, temporal information representation is absent from state-of-the-art actor-critic algorithms. The reliance on only the current timestep information causes instability in concurrent actions. Furthermore, the over-reliance on the reward as a performance metric hides the stability performance of the algorithms. This talk proposes methods to improve the action smoothness of reinforcement learning methods without explicitly having a reward component for it and details the training of a recurrent actor-critic reinforcement learning agent built on TD7.