G(houlish) P(retrained) T(errifier) š or training my own gpt (pretraining + finetuning + rlhf) to generate scary stories for halloween
In this post, I build and train a 1.5-billion-parameter GPT-inspired model from scratch to generate scary stories, achieving OpenAI's GPT-2 level accuracy (for the HellaSwag dataset). I pretrain on FineWeb using about eight hours of 8xH100 GPUs, I then fine-tune the model on a dataset of CreepyPasta