Reinforcement Learning - Les 17-3 - Soft-Actor-Critic - Bellman Equality, Td Error And Hjb In Rl