Introduction to Text Generation with Large Language Models – Digital Humanities Portfolio

These are notes from an introductory workshop on generating text with large language models.

Approaches Text Generation

Markov chains (Stochastic/Probabilistic)
RNNs (Recurrent Neural Networks)
LSTMs (Long Short-Term Memory [Type of RNN])
Transformers (see “Attention is All your Need”)

Large Language Models

Transformer Architecture
Trained Huge Data sets (16GB to 745+GB)
Long Training time (355 GPU-Years on Tesla V100 GPU)
Expensive (~$4,600,000 in computing costs)
Re-usable

Further Reading: https://dl.acm.org/doi/pdf/10.1145/3442188.3445922

Data (and Biases)

Trained on the internet
- Common Crawl (https://commoncrawl.org/)
- WebText2
- Digitized Books
- Wikipedia
- The Pile (https://pile.eleuther.ai/
Garbage in Garbage out
Known Racial, Gendered, and Religious Biases

Source: https://arxiv.org/pdf/2101.00027.pdf

Surprising Performance

Illusion of meaning
Translation
Text transformation
Tuning

Getting Started with Colab

colaboratory.google.com
Virtual environment
Interactive
Shareable
Free GPU

Workshop Notebook: https://colab.research.google.com/drive/1Y8thkankYotdrUs3_K1R96UDxSJ2e7p0?usp=sharing

Black Boxed

Nobody knows how it works
Predictive
Mot explanatory

Environmental Cost

Energy intensive
- Massive data
- Long Training time
GPT-3 produced an estimated 552 metric tons of carbon dioxide
Roughly equivalent to the emissions of 120 cars over the course of a year

Further Reading: https://arxiv.org/pdf/2104.10350.pdf

Privacy Risk

Based on data scraped without permission
Reverse-engineered to disclose sensitive information
Can be queried to uncover training data

Further Reading: https://ieeexplore.ieee.org/document/9152761

Other Resources

https://www.decontextualize.com/
https://ml4a.github.io/guides/
https://towardsdatascience.com/

February 2, 2022 Ulysses PascalBlogLeave a comment0

Leave a Reply Cancel reply

Automated Futures / Powered by WordPress

@UlyssesPascal