All Projects β†’ thecsw β†’ MemePolice_bot

thecsw / MemePolice_bot

Licence: GPL-2.0 license
This is a bot for r/PewdiepieSubmissions. Moderate harmful submissions by applying OCR on graphical content

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to MemePolice bot

Image2text
πŸ“‹ Python wrapper to grab text from images and save as text files using Tesseract Engine
Stars: ✭ 243 (+834.62%)
Mutual labels:  python-wrapper, tesseract-ocr
receipt-manager-app
Receipt parser application written in dart.
Stars: ✭ 140 (+438.46%)
Mutual labels:  tesseract-ocr
Snoo
A Reddit command line client written in Node.js, using modern ES-features
Stars: ✭ 39 (+50%)
Mutual labels:  reddit-api
flask-ocr
use flask and tesseract to have a basic ocr, also you need opencv2, this code use opencv2 to have a basic image process
Stars: ✭ 27 (+3.85%)
Mutual labels:  tesseract-ocr
Reddsaver
CLI tool to download saved and upvoted media from Reddit
Stars: ✭ 76 (+192.31%)
Mutual labels:  reddit-api
crypto-subreddits-cli
πŸ‘½ Track Cryptocurrency Subreddits On The Command Line πŸ‘½
Stars: ✭ 24 (-7.69%)
Mutual labels:  reddit-api
Redditbot
Discord bot for reddit.com
Stars: ✭ 17 (-34.62%)
Mutual labels:  reddit-api
spdlog-python
python wrapper around C++ spdlog ([email protected]:gabime/spdlog.git)
Stars: ✭ 46 (+76.92%)
Mutual labels:  python-wrapper
reddit-image-fetcher
A JavaScript package for fetching reddit images, memes, wallpapers and more.
Stars: ✭ 40 (+53.85%)
Mutual labels:  reddit-api
Praw
PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.
Stars: ✭ 2,675 (+10188.46%)
Mutual labels:  reddit-api
Unim.press
A Reddit front-page reader in the style of The New York Times.
Stars: ✭ 199 (+665.38%)
Mutual labels:  reddit-api
Reddit Bot
πŸ€– Making a Reddit Bot using Python, Heroku and Heroku Postgres.
Stars: ✭ 99 (+280.77%)
Mutual labels:  reddit-api
python-sutime
Python wrapper for Stanford CoreNLP's SUTime
Stars: ✭ 143 (+450%)
Mutual labels:  python-wrapper
Psraw
PowerShell Reddit API Wrapper
Stars: ✭ 42 (+61.54%)
Mutual labels:  reddit-api
Euro2016 TerminalApp
⚽ Instantly find πŸ†EURO 2016 live-streams & highlights, now a Web App!
Stars: ✭ 54 (+107.69%)
Mutual labels:  reddit-api
Redmin
A super lightweight reddit client for iOS
Stars: ✭ 12 (-53.85%)
Mutual labels:  reddit-api
Redditkit.rb
[Deprecated] A Ruby wrapper for the reddit API
Stars: ✭ 156 (+500%)
Mutual labels:  reddit-api
HighlightTranslator
Highlight Translator can help you to translate the words quickly and accurately. By only highlighting, copying, or screenshoting the content you want to translate anywhere on your computer (ex. PDF, PPT, WORD etc.), the translated results will then be automatically displayed before you.
Stars: ✭ 54 (+107.69%)
Mutual labels:  tesseract-ocr
ocreval
Update of the ISRI Analytic Tools for OCR Evaluation with UTF-8 support
Stars: ✭ 48 (+84.62%)
Mutual labels:  tesseract-ocr
tesseract-ocr-re
Tesseract 4 OCR Runtime Environment - Docker Container
Stars: ✭ 94 (+261.54%)
Mutual labels:  tesseract-ocr

MemePolice_bot

This is a reddit bot that finds old or not funnny posts and images on r/PewdiepieSubmissions subreddit. When the bot finds an illegal meme, the OP will receive the message below. This bot was created as a serious project (joke-ish) and later on was improved a little bit.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

What things you need to install the software and how to install them

sudo pip install praw
sudo pip install python-opencv
sudo pip install pytesseract
sudo pip install tesseract-ocr
sudo pip install Pillow
sudo pip install tqdm

OR

pip install --upgrade -r requirements.txt

praw is Python Reddit API Wrapper. This will be the main and only package to connect to Reddit's API and extract desired data.

python-opencv is used for image transformations and computer vision problems.

pytesseract is a python wrapper for Google's Tesseract-OCR.

Pillow is the Python Imaging Library by Fredrik Lundh and Contributors.

tqdm is used for fancy progress bars.

Other dependencies

Tesseract engine should be installed on a local machine to run the text recognition properly. We will also install the tesseract OCR trained languages for better accuracy and we will install only the English packages. For more information about other languages, please refer to tesseract's official repository on Github.

Linux

Debian, Ubuntu (aptitude)
sudo apt-get install tesseract-ocr
sudo apt-get install tesseract-ocr-eng
Arch Linux (pacman)
sudo pacman -S tesseract
sudo pacman -S tesseract-data-eng

Mac OS

Homebrew
brew install tesseract

I don't know what about other distros. I think tesseract-ocr is included in all package managers.

If you want to compile the tesseract engine by yourself, please refer to the official guides..

Installing

The only thing that needs to be done before execution is the config profile. In the config profile you should fill your Reddit API details.

For that please follow these steps

git clone https://github.com/thecsw/MemePolice_bot
cd MemePolice_bot
mv example.config.py config.py
nano config.py

After filling out the details, save and exit. You're done with installation.

Deployment

Remove the word 'example' from the title of all files with it.

Just run this

python __main__.py

That is everything. All illegal memes shall be found and OPs should be punsihed.

Source code

The code is heavily commented and all the important modules are being separated into different files. Looks pretty, dunno.

Here is a short description of all the source files

  • analyze.py - Heavy file on analyzing comments and filling them in needed files.
  • blacklist.py - Contains all the banned keywords that will trigger the bot to respond.
  • config.example.py - Just the PRAW (OAuth) credentials for bot initialization.
  • main.py - The main script that processses everything and has the main loop with threads.
  • message.py - Different messages that bot will send to users.
  • rude_phrases.py - Bot's responses to rude replies.
  • text_recognition.py - Returns text read from an image.
  • utils.py - Just the logging and other technical stuff.
  • __main__.py - The file that should be executed.

Here is a short description of .json and .txt files that we have

  • ./users/users.json - this is a list of offenders with the number of illegal content posted
  • ./data-analyzation/checked.txt - stores checked comments.
  • ./data-analyzation/words.json - sotres words and their respective frequency.
  • ./rudeness/checked.txt - stores old rude replies.
  • ./rudeness/rudeness_log.txt - stores the logs and other "rude" data.
  • ./violations/violations.json - stores all the violations.
  • ./logs/log.txt - stores errors and other debug information.
  • ./logs/violations_log.txt - debug data for violations.

Built With

  • praw is Python Reddit API Wrapper. This will be the main and only package to connect to Reddit's API and extract
    desired data.
  • python-opencv is used for image transformations and computer vision problems.
  • pytesseract is a python wrapper for Google's Tesseract-OCR.
  • Pillow is the Python Imaging Library by Fredrik Lundh and Contributors.
  • tqdm is used for fancy progress bars.

Authors

  • Sagindyk Urazayev - Initial work - thecsw
  • Justin Schwaitzberg - Rewriting, structuring, and adding new features - Schwaitz

Acknowledgments

  • Fedora tip to Justin Schwaitzberg for greatly contributing to the code, structuring it and making it fancier.

License

This project is licensed under the The GNU General Public License - see the LICENSE.md file for details) explains it pretty well.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].