Lectures
Readings
-
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. “BERT: Pre-training of deep bidirectional transformers for language understanding.” arXiv preprint arXiv:1810.04805 (2018).
-
Rogers, Anna, Olga Kovaleva, and Anna Rumshisky. “A primer in BERTology: What we know about how BERT works.” Transactions of the Association for Computational Linguistics 8 (2021): 842-866.
-
Jay Alammar, The Illustrated BERT.
-
Liu, Yinhan, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. “RoBERTa: A Robustly Optimized BERT Pretraining Approach.” arXiv preprint arXiv:1907.11692 (2019).