Bob Carpenter (CCM) will finish his introduction to transfomers and LLMs (large language models, the tech used in ChatGPT), in linear algebra notation, using 40 lines of code and the blackboard. This expands upon his 30-minute ASA talk on this topic.
This 2nd part explains attention layers.
See slides:
Transformers Pseudocode
https://drive.google.com/file/d/1pQC89WuBYMX4VPL65XpVsfE3TaQZ9ofh/view?usp=share_link
Transformers Talk Slides
https://drive.google.com/file/d/12EcW98NspJZ-c_sP4Hn35EY7UhA8_UXZ/view?usp=sharing