All Projects → lots-of-things → Gpt2 Bert Reddit Bot

lots-of-things / Gpt2 Bert Reddit Bot

Licence: cc-by-sa-4.0
a bot that generates realistic replies using a combination of pretrained GPT-2 and BERT models

Projects that are alternatives of or similar to Gpt2 Bert Reddit Bot

Courseraml
I took Andrew Ng's Machine Learning course on Coursera and did the homework assigments... but, on my own in python because I love jupyter notebooks!
Stars: ✭ 1,911 (+1109.49%)
Mutual labels:  jupyter-notebook
Machine Learning Workflow With Python
This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation
Stars: ✭ 157 (-0.63%)
Mutual labels:  jupyter-notebook
Pythonrobotics
Python sample codes for robotics algorithms.
Stars: ✭ 13,934 (+8718.99%)
Mutual labels:  jupyter-notebook
Pastas
🍝 Pastas is an open-source Python framework for the analysis of hydrological time series.
Stars: ✭ 155 (-1.9%)
Mutual labels:  jupyter-notebook
Python Textualheatmap
Create interactive textual heat maps for Jupiter notebooks
Stars: ✭ 156 (-1.27%)
Mutual labels:  jupyter-notebook
Yolo2
这个项目是基于论文YOLO9000: Better, Faster, Stronger的keras(backend:tensorflow)实现
Stars: ✭ 157 (-0.63%)
Mutual labels:  jupyter-notebook
Fairseq Zh En
NMT for chinese-english using fairseq
Stars: ✭ 155 (-1.9%)
Mutual labels:  jupyter-notebook
Cartoonify
Deploy and scale serverless machine learning app - in 4 steps.
Stars: ✭ 157 (-0.63%)
Mutual labels:  jupyter-notebook
Openedu
📚 The Open Source Education Initiative – a repository with resources for 60+ engineering subjects. Let's make education more open and accessible! 🚀✨
Stars: ✭ 156 (-1.27%)
Mutual labels:  jupyter-notebook
Gasyori100knock
image processing codes to understand algorithm
Stars: ✭ 1,988 (+1158.23%)
Mutual labels:  jupyter-notebook
Nbpresent
next generation slides for Jupyter Notebooks
Stars: ✭ 156 (-1.27%)
Mutual labels:  jupyter-notebook
Yolov4 Cloud Tutorial
This repository walks you through how to Build and Run YOLOv4 Object Detections with Darknet in the Cloud with Google Colab.
Stars: ✭ 153 (-3.16%)
Mutual labels:  jupyter-notebook
Tensorflow On Android For Human Activity Recognition With Lstms
iPython notebook and Android app that shows how to build LSTM model in TensorFlow and deploy it on Android
Stars: ✭ 157 (-0.63%)
Mutual labels:  jupyter-notebook
Programming With Data
🐍 Learn Python and Pandas from the ground up
Stars: ✭ 156 (-1.27%)
Mutual labels:  jupyter-notebook
Fastbook
The fastai book, published as Jupyter Notebooks
Stars: ✭ 13,998 (+8759.49%)
Mutual labels:  jupyter-notebook
Deep q learning
This is the Code for "Deep Q Learning - The Math of Intelligence #9" By Siraj Raval on Youtube
Stars: ✭ 156 (-1.27%)
Mutual labels:  jupyter-notebook
Pikachu Detection
Detecting Pikachu on Android using Tensorflow Object Detection API
Stars: ✭ 157 (-0.63%)
Mutual labels:  jupyter-notebook
Covid19 mobility
COVID-19 Mobility Data Aggregator. Scraper of Google, Apple, Waze and TomTom COVID-19 Mobility Reports🚶🚘🚉
Stars: ✭ 156 (-1.27%)
Mutual labels:  jupyter-notebook
Notebooks
Stars: ✭ 157 (-0.63%)
Mutual labels:  jupyter-notebook
Machine Learning
机器学习&深度学习资料笔记&基本算法实现&资源整理(ML / CV / NLP / DM...)
Stars: ✭ 159 (+0.63%)
Mutual labels:  jupyter-notebook

gpt2-bert-reddit-bot

series of scripts to fine-tune GPT-2 and BERT models using reddit data for generating realistic replies.

jupyter notebooks also available on Google Colab here

see my blog post for a walkthrough on running the scripts

processing training data

I use pandas read_gbq to read from google bigquery. get_reddit_from_gbq.py automates the download. prep_data.py cleans and transforms the data into a format that is usable by the GPT2 and BERT fine-tuning scripts. I manually upload the results from prep_data.py into Google Drive to be used by the Google Colab notebooks.

Here is a sample of the data format outputted from prep_data.py:

"Is there any way this could be posted as a document so it can be saved permanently, outwith reddit? [SEP] Could you not just copy and paste it yourself into a word processor document?"
"Seems like alt-history is a format that would almost *require* a detailed outline before writing [SEP] Are you aware of any good outliners or character sheets for writing novels? I like to organize and plan on the macro level and then, knowing what I want to accomplish and with which character, I can then discovery write at the micro level. "
"This is depressing [SEP] There are the books and they are excellent. There are also audiobooks which are also outstanding. Including side story novellas!

Also there is no apparent sign of James S. A. Corey (which is actually two authors: Daniel Abraham and Ty Franck) going all George R. R. Martin / Robert Jordan."

pulling reddit comments with praw

I use praw to download comments.

reddit = praw.Reddit(client_id='client_id', 
                     client_secret='client_secret',
                     password='reddit_password',
                     username='reddit_username',
                     user_agent='reddit user agent name')
                     
...
subreddit = reddit.subreddit(subreddit_name)
for h in subreddit.rising(limit=5):
  for c in h.comments:
    {do stuff}
 

See the code for more details.

training, generating, classifying

more documentation to come soon...

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].