All Projects → minimaxir → Download Tweets Ai Text Gen

minimaxir / Download Tweets Ai Text Gen

Licence: mit
Python script to download public Tweets from a given Twitter account into a format suitable for AI text generation.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Download Tweets Ai Text Gen

download-tweets-ai-text-gen-plus
Python script to download public Tweets from a given Twitter account into a format suitable for AI text generation
Stars: ✭ 26 (-85.71%)
Mutual labels:  twitter, text-generation
Tock
Tock - the open source conversational AI toolkit
Stars: ✭ 175 (-3.85%)
Mutual labels:  twitter
Productive Twitter
Chrome extension: Minimal and friendly theme for productive twitter use
Stars: ✭ 148 (-18.68%)
Mutual labels:  twitter
Scrape Twitter
🐦 Access Twitter data without an API key. [DEPRECATED]
Stars: ✭ 166 (-8.79%)
Mutual labels:  twitter
Laravel Twitter Streaming Api
Easily work with the Twitter Streaming API in a Laravel app
Stars: ✭ 153 (-15.93%)
Mutual labels:  twitter
Awesome Twitter Bots
🌟Resource repo for Twitter Bots 🐦
Stars: ✭ 170 (-6.59%)
Mutual labels:  twitter
Simplesharingbuttons
Share to Facebook, Twitter, Google+ and other social networks using simple HTML buttons.
Stars: ✭ 147 (-19.23%)
Mutual labels:  twitter
Sharemeow
😻 text shots service
Stars: ✭ 180 (-1.1%)
Mutual labels:  twitter
Twitter Video Downloader
Download Twitter video streams.
Stars: ✭ 172 (-5.49%)
Mutual labels:  twitter
Mastodon Bot
a bot for mirroring Twitter/Tumblr accounts and RSS feeds on Mastodon
Stars: ✭ 158 (-13.19%)
Mutual labels:  twitter
Spark Pac4j
Security library for Sparkjava: OAuth, CAS, SAML, OpenID Connect, LDAP, JWT...
Stars: ✭ 154 (-15.38%)
Mutual labels:  twitter
Nlp pytorch project
Embedding, NMT, Text_Classification, Text_Generation, NER etc.
Stars: ✭ 153 (-15.93%)
Mutual labels:  text-generation
React Twitter Embed
Simplest way to add twitter widgets to your react project.
Stars: ✭ 171 (-6.04%)
Mutual labels:  twitter
Kobe
Source code and dataset for KDD 2019 paper "Towards Knowledge-Based Personalized Product Description Generation in E-commerce"
Stars: ✭ 148 (-18.68%)
Mutual labels:  text-generation
Twitter Macos Swiftui Sample
Twitter macOS Big Sur SwiftUI example app
Stars: ✭ 176 (-3.3%)
Mutual labels:  twitter
Twitter Api Php
The simplest PHP Wrapper for Twitter API v1.1 calls
Stars: ✭ 1,808 (+893.41%)
Mutual labels:  twitter
Vae Lagging Encoder
PyTorch implementation of "Lagging Inference Networks and Posterior Collapse in Variational Autoencoders" (ICLR 2019)
Stars: ✭ 153 (-15.93%)
Mutual labels:  text-generation
Reading List Mover
A Python utility for moving bookmarks/reading lists between services
Stars: ✭ 166 (-8.79%)
Mutual labels:  twitter
Postwill
Posting to the most popular social media from Ruby
Stars: ✭ 181 (-0.55%)
Mutual labels:  twitter
Tweet.sh
Twitter client written in simple Bash script
Stars: ✭ 178 (-2.2%)
Mutual labels:  twitter

download-tweets-ai-text-gen

A small Python 3 script to download public Tweets from a given Twitter account into a format suitable for AI text generation tools (such as gpt-2-simple for finetuning GPT-2).

  • Retrieves all tweets as a simple CSV with a single CLI command.
  • Preprocesses tweets to remove URLs, extra spaces, and optionally usertags/hashtags.
  • Saves tweets in batches (i.e. there is an error or you want to end collection early)

You can view examples of AI-generated tweets from datasets retrieved with this tool in the /examples folder.

Inspired by popular demand due to the success of @dril_gpt2.

Usage

First, install the Python script dependencies:

pip3 install twint==2.1.4 fire tqdm

Then download the download_tweets.py script from this repo.

The script is interacted via a command line interface. After cding into the directory where the script is stored in a terminal, run:

python3 download_tweets.py <twitter_username>

e.g. If you want to download all tweets (sans retweets/replies/quote tweets) from Twitter user @dril, run:

python3 download_tweets.py dril

The script can can also download tweets from multiple usernames at one time. To do so, first create a text file (.txt) with the list of usernames. Then, run script referencing the file name:

python3 download_tweets.py <twitter_usernames_file_name>

The tweets will be downloaded to a single-column CSV titled <usernames>_tweets.csv.

The parameters you can pass to the command line interface (positionally or explicitly) are:

  • username: Username of the account whose tweets or .txt file name with multiple usernames you want to download [required]
  • limit: Number of tweets to download [default: all tweets possible]
  • include_replies: Include replies from the user in the dataset [default: False]
  • strip_usertags: Strips out @ user tags in the tweet text [default: False]
  • strip_hashtags: Strips out # hashtags in the tweet text [default: False]

How to Train an AI on the downloaded tweets

gpt-2-simple has a special case for single-column CSVs, where it will automatically process the text for best training and generation. (i.e. by adding <|startoftext|> and <|endoftext|> to each tweet, allowing independent generation of tweets)

You can use this Colaboratory notebook (optimized from the original notebook for this use case) to train the model on your downloaded tweets, and generate massive amounts of Tweets from it. Note that without a lot of data, the model might easily overfit; you may want to train for fewer steps (e.g. 500).

When generating, you'll always need to include certain parameters to decode the tweets, e.g.:

gpt2.generate(sess,
              length=200,
              temperature=0.7,
              prefix='<|startoftext|>',
              truncate='<|endoftext|>',
              include_prefix=False
              )

Helpful Notes

  • Retweets are not included in the downloaded dataset. (which is generally a good thing)
  • You'll need thousands of tweets at minimum to feed to the input model for a good generation results. (ideally 1 MB of input text data, although with tweets that hard to achieve)
  • To help you reach the 1 MB of input text data, you can load data from multiple similar Twitter usernames
  • The download will likely end much earlier than the theoretical limit (inferred from the user profile) as the limit includes retweets/replies/whatever cache shennanigans Twitter is employing.
  • The legalities of distributing downloaded tweets is ambigious, therefore it's recommended avoiding commiting raw Twitter data to GitHub, and is the reason examples of such data is not included in this repo. (AI-generated tweets themselves likely fall under derivative work/parody protected by Fair Use)

Maintainer/Creator

Max Woolf (@minimaxir)

Max's open-source projects are supported by his Patreon and GitHub Sponsors. If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.

License

MIT

Disclaimer

This repo has no affiliation with Twitter Inc.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].