The alternative you propose is perfectly fine, but whether you initialize to 0 or theta(1), the first values of the moving average will be biased towards this constant (this may be acceptable for your use case). The formula from the later video tries to correct this initial bias.