I was familiar with most of this content before taking the course so, along with the time constraint of taking a summer course, the coverage here is very limited.
Input query is tokenized, then each token is converted to an integer, used to look up an embedding within the learnt latent space. This representation is then used to predict the embedding of the next token, which is subsequently decoded into human-readable text.
Not a horrible simplification.
Instead of conditioning on all tokens equally, condition more on relevant tokens!
Autoregressive generation of next token $x_6$.
Interesting way to view it.
Technique which models sentence generation using an MDP, where rewards are decided using human feedback.