aitextgen has a command-line interface to quickly automate common tasks, and make it less necessary to use a script; helpful if running on a remote server.
Encodes given text
text.txt into a cache and compressed
TokenDataset, good for prepping a dataset for transit to a remote server.
aitextgen encode text.txt
If you are encoding a CSV, you should pass in the
line_by_line parameter as well.
aitextgen encode reddit.csv --line_by_line True
To train/finetune on the default 124M GPT-2, given text
text.txt and all default parameters:
aitextgen train text.txt
If you are using a cached/compressed dataset that ends with
tar.gz (e.g one created by the Encoding CLI command above), you can pass that to this function as well.
aitextgen train dataset_cache.tar.gz
Other parameters to the TokenDataset constructor can be used.
Loads a model and generates to a file.
By default, it will generate 20 texts to the file, 1 at a time at temperature of 0.7.
You can print to console instead by passing
aitextgen generate --prompt "I believe in unicorns because" --to_file False