I think that Adam optimizer is designed such that it automtically adjusts the learning rate. But there is an option to explicitly mention the decay in the Adam parameter options in Keras. I want to clarify the effect of decay on Adam optimizer in Keras. If we compile the model using decay say 0.01 on lr = 0.001, and then fit the model running for 50 epochs, then does the learning rate get reduced by a factor of 0.01 after each epoch?
Is there any way where we can specify that the learning rate should decay only after running for certain number of epochs?
In pytorch there is a different implementation called AdamW, which is not present in the standard keras library. Is this the same as varying the decay after every epoch as mentioned above?
Thanks in advance for the reply.