digitalmethodsinitiative / 4cat

Licence: other
The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.

Programming Languages

python
139335 projects - #7 most used programming language
javascript
184084 projects - #8 most used programming language
HTML
75241 projects
CSS
56736 projects
PLpgSQL
1095 projects
shell
77523 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to 4cat

Socialreaper
Social media scraping / data collection library for Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Stars: ✭ 338 (+134.72%)
Mutual labels:  social-media, scraping
dmi-instascraper
A GUI for Instaloader to scrape users and hashtags with on Instagram
Stars: ✭ 21 (-85.42%)
Mutual labels:  scraping, digitalmethods
dtube
Decentralized video sharing & social media platform on Ethereum blockchain.
Stars: ✭ 70 (-51.39%)
Mutual labels:  social-media
next-share
Social media share buttons for your next React apps.
Stars: ✭ 145 (+0.69%)
Mutual labels:  social-media
plexus
Plexus - Interactive Emotion Visualization based on Social Media
Stars: ✭ 27 (-81.25%)
Mutual labels:  social-media
turtle
Instagram Photo Downloader
Stars: ✭ 15 (-89.58%)
Mutual labels:  scraping
socials
👨‍👩‍👦 Social account detection and extraction in Python, e.g. for crawling/scraping.
Stars: ✭ 37 (-74.31%)
Mutual labels:  scraping
gochanges
**[ARCHIVED]** website changes tracker 🔍
Stars: ✭ 12 (-91.67%)
Mutual labels:  scraping
vosonSML
R package for collecting social media data and creating networks for analysis.
Stars: ✭ 65 (-54.86%)
Mutual labels:  social-media
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+393.75%)
Mutual labels:  scraping
cloud-functions
OpenFaaS social functions
Stars: ✭ 27 (-81.25%)
Mutual labels:  social-media
scrapers
scrapers for building your own image databases
Stars: ✭ 46 (-68.06%)
Mutual labels:  scraping
social
A simple social media using MEAN Stack. Frontend: Angular 6.
Stars: ✭ 13 (-90.97%)
Mutual labels:  social-media
Architeuthis
MITM HTTP(S) proxy with integrated load-balancing, rate-limiting and error handling. Built for automated web scraping.
Stars: ✭ 35 (-75.69%)
Mutual labels:  scraping
NBA-Fantasy-Optimizer
NBA Daily Fantasy Lineup Optimizer for FanDuel Using Python
Stars: ✭ 21 (-85.42%)
Mutual labels:  scraping
html-table-to-json
Generate JSON representations of HTML tables
Stars: ✭ 39 (-72.92%)
Mutual labels:  scraping
double-agent
A test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (-14.58%)
Mutual labels:  scraping
etf4u
📊 Python tool to scrape real-time information about ETFs from the web and mixing them together by proportionally distributing their assets allocation
Stars: ✭ 29 (-79.86%)
Mutual labels:  scraping
ts-ui
Telar Social Network using Reactjs
Stars: ✭ 35 (-75.69%)
Mutual labels:  social-media
media-roller
A self hosted server to download videos from social media with an iOS shortcut for on-click saving to camera roll
Stars: ✭ 52 (-63.89%)
Mutual labels:  social-media

4CAT: Capture and Analysis Toolkit

DOI: 10.5281/zenodo.4742622 DOI: 10.5117/CCR2022.2.007.HAGE License: MPL 2.0 Requires Python 3.8 Docker image status

4CAT has a website at 4cat.nl.

A screenshot of 4CAT, displaying its 'Create Dataset' interfaceA screenshot of 4CAT, displaying a network visualisation of a dataset

4CAT is a research tool that can be used to analyse and process data from online social platforms. Its goal is to make the capture and analysis of data from these platforms accessible to people through a web interface, without requiring any programming or web scraping skills. Our target audience is researchers, students and journalists interested using Digital Methods in their work.

In 4CAT, you create a dataset from a given platform according to a given set of parameters; the result of this (usually a CSV or JSON file containing matching items) can then be downloaded or analysed further with a suite of analytical 'processors', which range from simple frequency charts to more advanced analyses such as the generation and visualisation of word embedding models.

4CAT has a (growing) number of supported data sources corresponding to popular platforms that are part of the tool, but you can also add additional data sources using 4CAT's Python API. The following data sources are currently supported actively:

  • 4chan and 8kun
  • BitChute
  • Reddit
  • Telegram
  • Tumblr
  • Twitter API v2 (Academic and regular tracks)

The following platforms are supported through other tools, from which you can import data into 4CAT for analysis:

  • Facebook and Instagram (via CrowdTangle exports)
  • Instagram, TikTok and LinkedIn (via Zeeschuimer or CrowdTangle)

A number of other platforms have built-in support that is untested, or requires e.g. special API access. You can view the data sources in our wiki or review the data sources' code in the GitHub repository.

Install

You can install 4CAT locally or on a server via Docker or manually. Copying our docker-compose.yml file, .env file, and using

docker-compose up -d

will pull the lastest version from Docker Hub. Detailed and alternative installation instructions are available in our wiki. Currently scraping of 4chan, 8chan, and 8kun require additional steps; please see the wiki.

Please check our issues and create one if you experience any problems (pull requests are also very welcome).

Components

4CAT consists of several components, each in a separate folder:

  • backend: A standalone daemon that collects and processes data, as queued via the tool's web interface or API.
  • webtool: A Flask app that provides a web front-end to search and analyze the stored data with.
  • common: Assets and libraries.
  • datasources: Data source definitions. This is a set of configuration options, database definitions and python scripts to process this data with. If you want to set up your own data sources, refer to the wiki.
  • processors: A collection of data processing scripts that can plug into 4CAT to manipulate or process datasets created with 4CAT. There is an API you can use to make your own processors.

Credits & License

4CAT was created at OILab and the Digital Methods Initiative at the University of Amsterdam. The tool was inspired by DMI-TCAT, a tool with comparable functionality that can be used to scrape and analyse Twitter data.

4CAT development is supported by the Dutch PDI-SSH foundation through the CAT4SMR project.

4CAT is licensed under the Mozilla Public License, 2.0. Refer to the LICENSE file for more information.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].