What does self-attention allow a language model to do?