These are notes from an introductory workshop on generating text with large language models.
Approaches Text Generation
- Markov chains (Stochastic/Probabilistic)
- RNNs (Recurrent Neural Networks)
- LSTMs (Long Short-Term Memory [Type of RNN])
- Transformers (see “Attention is All your Need”)
Large Language Models
- Transformer Architecture
- Trained Huge Data sets (16GB to 745+GB)
- Long Training time (355 GPU-Years on Tesla V100 GPU)
- Expensive (~$4,600,000 in computing costs)
- Re-usable
Further Reading: https://dl.acm.org/doi/pdf/10.1145/3442188.3445922
Data (and Biases)
- Trained on the internet
- Common Crawl (https://commoncrawl.org/)
- WebText2
- Digitized Books
- Wikipedia
- The Pile (https://pile.eleuther.ai/
- Garbage in Garbage out
- Known Racial, Gendered, and Religious Biases
Source: https://arxiv.org/pdf/2101.00027.pdf
Surprising Performance
- Illusion of meaning
- Translation
- Text transformation
- Tuning
Getting Started with Colab
- colaboratory.google.com
- Virtual environment
- Interactive
- Shareable
- Free GPU
Workshop Notebook: https://colab.research.google.com/drive/1Y8thkankYotdrUs3_K1R96UDxSJ2e7p0?usp=sharing
Black Boxed
- Nobody knows how it works
- Predictive
- Mot explanatory
Environmental Cost
- Energy intensive
- Massive data
- Long Training time
- GPT-3 produced an estimated 552 metric tons of carbon dioxide
- Roughly equivalent to the emissions of 120 cars over the course of a year
Further Reading: https://arxiv.org/pdf/2104.10350.pdf
Privacy Risk
- Based on data scraped without permission
- Reverse-engineered to disclose sensitive information
- Can be queried to uncover training data
Further Reading: https://ieeexplore.ieee.org/document/9152761
Other Resources
- https://www.decontextualize.com/
- https://ml4a.github.io/guides/
- https://towardsdatascience.com/