All Projects → eddyharrington → WhatSoup

eddyharrington / WhatSoup

Licence: MIT license
A web scraper that exports your entire WhatsApp chat history.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to WhatSoup

Whatsapp-Chat-Exporter
A customizable Android and iPhone WhatsApp database parser that will give you the history of your WhatsApp conversations in HTML and JSON. Android Backup Crypt12, Crypt14 and Crypt15 supported.
Stars: ✭ 150 (+74.42%)
Mutual labels:  whatsapp, whatsapp-export
tithiwa
Automate Web WhatsApp with selenium in python.
Stars: ✭ 17 (-80.23%)
Mutual labels:  whatsapp, selenium-python
WhatsAppBulkMessenger
This tools sends WhatsApp messages and invites directly to people on WhatsApp itself, without saving their contacts 🌠
Stars: ✭ 73 (-15.12%)
Mutual labels:  whatsapp, selenium-python
whatsapp-bot
Piyobot adalah whatsapp bot pintar
Stars: ✭ 124 (+44.19%)
Mutual labels:  whatsapp
whatsapp-clone-react
Build a WhatsApp Clone with React JS and FireBase.
Stars: ✭ 38 (-55.81%)
Mutual labels:  whatsapp
vemdezapbe.be
Vem de zap bb 😏😊😂
Stars: ✭ 33 (-61.63%)
Mutual labels:  whatsapp
WhatsAppThem
A simple PWA which you can use to send WhatsApp messages without saving the number.
Stars: ✭ 21 (-75.58%)
Mutual labels:  whatsapp
Python
Python
Stars: ✭ 22 (-74.42%)
Mutual labels:  selenium-python
WhatsAppBar
Little sweet addition to your WhatsApp Desktop
Stars: ✭ 36 (-58.14%)
Mutual labels:  whatsapp
CampusDailyAutoSign
今日校园/体温签到/海南大学/QQ机器人
Stars: ✭ 15 (-82.56%)
Mutual labels:  selenium-python
Love-Calculator
Let's Calculate love with real data. Love Calculator by Mohammed Cha
Stars: ✭ 54 (-37.21%)
Mutual labels:  whatsapp
wechat articles spider
A Spider About Wechat Articles 、Official Accounts
Stars: ✭ 25 (-70.93%)
Mutual labels:  beautifulsoup
BookingScraper
🌎 🏨 Scrape Booking.com 🏨 🌎
Stars: ✭ 68 (-20.93%)
Mutual labels:  beautifulsoup
Whatsapp-Direct-Messaging-API
An API that opens Whatsapp application directly start a conversation based on given mobile number
Stars: ✭ 66 (-23.26%)
Mutual labels:  whatsapp
arjuna
Arjuna is a Python based test automation framework developed by Rahul Verma (www.rahulverma.net).
Stars: ✭ 20 (-76.74%)
Mutual labels:  selenium-python
whatsapp-PWA
Progressive Web app of Whatsapp web
Stars: ✭ 16 (-81.4%)
Mutual labels:  whatsapp
google-meet-bot
Bot for scheduling and entering google meet sessions automatically
Stars: ✭ 33 (-61.63%)
Mutual labels:  selenium-python
node-red-contrib-whatsappbot
Whatsapp Bot 🤖 for Node-Red
Stars: ✭ 37 (-56.98%)
Mutual labels:  whatsapp
wppconnect-server
Wppconnect Server is a ready-to-use API, just download, install, and start using, simple as that.
Stars: ✭ 290 (+237.21%)
Mutual labels:  whatsapp
whatsanalyze
Analyze your WhatsApp Chat in Seconds. Reveal insights & get statistics, while all data stays on your device. No chat data is sent to a server it runs only locally in your browser.
Stars: ✭ 41 (-52.33%)
Mutual labels:  whatsapp

WhatSoup 🍲

A (deprecated) web scraper that exports your entire WhatsApp chat history.

DEPRECATED as of April 2021: I cannot maintain this repo any longer but feel free to fork and maintain it going forward.

Table of Contents

  1. Overview
  2. Demo
  3. Prerequisites
  4. Instructions
  5. Frequently Asked Questions

Overview

Problem

  1. Exports are limited up to a maximum of 40,000 messages
  2. Exports skip the text portion of media-messages by replacing the entire message with <Media omitted> instead of for example <Media omitted> My favorite selfie of us 😻🐶🤳
  3. Exports are limited to a .txt file format

Solution

WhatSoup solves these problems by loading the entire chat history in a browser, scraping the chat messages (only text, no media), and exporting it to .txt, .csv, or .html file formats.

Example output:

WhatsApp Chat with Bob Ross.txt

02/14/2021, 02:04 PM - Eddy Harrington: Hey Bob 👋 Let's move to Signal!
02/14/2021, 02:05 PM - Bob Ross: You can do anything you want. This is your world.
02/15/2021, 08:30 AM - Eddy Harrington: How about we use WhatSoup 🍲 to backup our cherished chats?
02/15/2021, 08:30 AM - Bob Ross: However you think it should be, that’s exactly how it should be.
02/15/2021, 08:31 AM - Eddy Harrington: You're the best, Bob ❤
02/19/2021, 11:24 AM - Bob Ross: <Media omitted> My latest happy 🌲 painting for you.

Demo

Watch the video on YouTube

Prerequisites

  • You have a WhatsApp account
  • You have Chrome browser installed
  • You have some familiarity with setting up and running Python scripts
  • Your terminal supports unicode (UTF-8) characters (for chat emoji's)

Instructions

  1. Make sure your WhatsApp chat settings are set to English language. This needs to be done on your phone (instructions here). You can change it back afterwards, but for now the script relies on certain HTML elements/attributes that contain English characters/words.

  2. Clone the repo:

    git clone https://github.com/eddyharrington/WhatSoup.git
    
  3. Create a virtual environment:

    # Windows
    python -m venv env
    
    # Linux & Mac
    python3 -m venv env
    
  4. Activate the virtual environment:

    # Windows
    env/Scripts/activate
    
    # Linux & Mac
    source env/bin/activate
    
  5. Install the dependencies:

    # Windows
    pip install -r requirements.txt
    
    # Linux & Mac
    python3 -m pip install -r requirements.txt
    
  6. Setup your environment

  • Download ChromeDriver and extract it to a local folder (such as the env folder)

  • Get your Chrome browser Profile Path by opening Chrome and entering chrome://version into the URL bar

  • Create an .env file with an entry for DRIVER_PATH and CHROME_PROFILE that specify the directory paths for your ChromeDriver and your Chrome Profile from above steps:

    # Windows
    DRIVER_PATH = 'C:\path-to-your-driver\chromedriver.exe'
    CHROME_PROFILE = 'C:\Users\your-username\AppData\Local\Google\Chrome\User Data'
    
    # Linux & Mac
    DRIVER_PATH = '/Users/your-username/path-to-your-driver/chromedriver'
    CHROME_PROFILE = '/Users/your-username/Library/Application Support/Google/Chrome/Default'
    
  1. Run the script

    # Windows
    python whatsoup.py
    
    # Linux & Mac
    python3 whatsoup.py
    

    Note for Mac users: you may get blocked when trying to run the script the first time with a message about chromedriver not being from an identified developer. This is normal. Follow these instructions to grant chromedriver an exception, then re-run the script.

Frequently Asked Questions

Does it download pictures / media?

No.

How large of chats can I load/export?

The most demanding part of the process is loading the entire chat in the browser, in which performance heavily depends on how much memory your computer has and how well Chrome handles the large DOM load. For reference, my largest chat (~50k messages) uses about 10GB of RAM.

How long does it take to load/export?

Depends on the chat size and how performant your computer is, however below is a ballpark range to expect. For large chats, I recommend turning your PC's sleep/power settings to OFF and running the script in the evening or before bed so it loads over night.

# of msgs in chat history Load time
500 1 min
5,000 12 min
10,000 35 min
25,000 3.5 hrs
50,000 8 hrs

Why is it so slow?!

Basically, browsers become easily bottlenecked when loading massive amounts of rich data in WhatsApp, which is a WebSocket application and is constantly sending/receiving information and changing the HTML/DOM.

I'm open to ideas but most of the things I tried didn't help performance:

  • Chrome vs Firefox
  • Headless browsing
  • Disabling images
  • Removing elements from DOM
  • Changing 'experimental' browser settings to allocate more memory

Can I...

  1. Use Firefox instead of Chrome? Yes, not out of the box though. There are a few Selenium differences and nuances to get it working, which I can share if there's interest. TODO.

  2. Use headless? Yes, but I only got this to work with Firefox and not Chrome.

  3. Use WhatSoup to scrape a local WhatsApp HTML file? Yes, you'd just need to bypass a few functions from main() and load the HTML file into Selenium's driver, then run the scraping/exporting functions like the below. If there's enough interest I can look into adding this to WhatSoup myself. TODO.

    # Load and scrape data from local HTML file
    def local_scrape(driver):
        driver.get('C:\your-WhatSoup-dir\source.html')
        scraped = scrape_chat(driver)
        scrape_is_exported("source", scraped)
    
  4. Contribute to WhatSoup? Please do!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].