datajournalism-resources
A compilation of links to datajournalism & OSINT tools, guides and resources I find useful to keep at hand. PRs welcomed!
by r3mlab | License: CC-BY-NC 4.0
Legend:
🌐 = online tool/service/database💻 = software📖 = guide/tutorial📝 = list of tools/resources🐍 = Python module💲 = paid or paid-only tool/service
Contents
- APIs
- Archival
- Breached Data
- Companies
- Data Analysis & Manipulation
- Lists of tools & resources
- Location, Maps, Satellite Imagery
- Military/Weapons
- Multi-purpose tools
- News
- Phone numbers
- Pictures, Photos, Videos
- Social Networks
- Text & Documents
- Transportation
- Visualization
- Weather
- Websites
- Misc
APIs
- Postman
💻 - API development environment offering useful tools for crafting and debugging API requests. - ProgrammableWeb
📝 - A good API directory. - Public APIs
📝 - A categorized list of APIs.
Archival
- archive.today
🌐 - Saves pages as screenshots, useful for websites the WayBack Machine can't handle. - Firefox Screenshots
💻 - Firefox can take a screenshot of a full page (i.e. 'scrolling' screenshot). - How to Archive Open Source Materials
📖 (Bellingcat) - Hunch.ly
🌐 💲 - Web capture tool designed for online investigations ($129.99/y). - Internet Archive Wayback Machine
🌐 - waybackpack
💻 🐍 - Command-line utility & Python library to download content from the Wayback Machine. See this example.
- waybackpack
- view-page-archive
💻 - Browser extension to search for a page's archives on 15+ web archival/caching sites.
Breached Data
- Breach Data Search Engines Comparison
📝 (IntelTechniques) - CardPwn
💻 - Find out if a credit card number appears in a breach. - Dehashed
🌐 💲 - Find cleartext & hashed password from data breaches (paid, $4/week, $11/mo). - GhostProject
🌐 - Check if an email appears in a breach. Shows the first 3 characters of the password for free. - h8mail
💻 - Find passwords through different breach and reconnaissance services. Can also search the BreachedCompilation torrent. - Have I Been Pwned?
🌐 - Check if an email appears in a breach, set up alerts. - pwndb.py
💻 - Command-line tool for searching leaked credentials using the Onion service with the same name. - WhatBreach
💻 - Search for breached emails and their corresponding database.
Companies
- CompaniesHouse Short Guide
📖 (Bellingcat) - A guide about the UK online company registry. - DocumentCloud Search
🌐 - Search public documents uploaded to DocumentCloud, a publishing plateform used by many journalists and media. - ICIJ's Offshore Leaks Database
🌐 - Data on offshore companies, foundations and trusts from the Panama Papers, the Offshore Leaks, the Bahamas Leaks and the Paradise Papers investigations. - List of company registers
📝 (Wikipedia) - A list of all companies registers, by country. - OCCRP Data
🌐 - Fantastic search tool & resources made available by OCCRP. Public records, leaks, scraped business registers, and more. - OCCRP Investigative Dashboard
📝 - Collection of the most useful public data sources for investigative reporting. Many business registries listed. - OpenCorporates
🌐 - A very comprehensive companies database. Has an API. - Open Ownership Register
🌐 - Explore beneficial ownership data. Aggregates many datasets.
Data Analysis & Manipulation
See also: Visualization
- csvkit
💻 - A suite of command-line tools for converting to and working with CSV files. - OpenRefine
💻 - Clean & transform messy data. - pandas
🐍 - Powerful Python data analysis library. Best used in a Jupyter notebook.
See also: Breached Data
- emailrep.io
🌐 - Public email reputation search & API. Can find social media profiles. - Infoga
💻 - Gather email accounts information (ip, hostname, country, etc) from different public sources. - theHarvester
💻 - Python command-line tool to search several search engines for mail addresses from a particular domain. - The most complete guide to finding anyone's email
📖 (Blurbiz) - Trumail
🌐 - Free email verification API.
Lists of tools & resources
- Citadel
💻 - A library of OSINT tools. - IntelTechniques.com
📝 - Blog, podcast, and paid OSINT/privacy training courses. - Guides
📖 (Bellingcat) - OSINT & Datajournalism how-tos. - Online Investigation Toolkit
📝 (Bellingcat) - awesome-osint
📝 - A curated list of open source intelligence tools and resources. - OSINT framework
📝 - Tree list of OSINT tools & resources. - OSINT Collection
📝 - Collection of OSINT related resources. - I-Intelligence's Open Source Intelligence Tools and Resources Handbook 2018
📝 - Very complete list of OSINT tools & resources, organized by category. No descriptions. - AutomatedOSINT.com
📖 - A Blog about automating OSINT techniques using Python. - netbootcamp
📝 - Custom search forms and lists of resources by theme. - Week in OSINT
📝 - Fresh links to OSINT tools, resources and investigations every week.
Location, Maps, Satellite Imagery
Interpretation
- How To Use Google Earth’s Three Dimensional View
📖 (Bellingcat) - Identify Burnt Villages on Satellite Imagery
📖 (Bellingcat) - Photo Interpretation Student Handbook
📖 (US Defense Mapping Agency, 1996) - Old unclassified handbook on analyzing aerial & satellite imagery. General principles & specifics for buildings, industries, transportation & communication facilities. - Using Time Lapse Satellite Imagery to Detect Infrastructure Changes
📖 (Bellingcat)
Mapping services & software
- Baidu Maps
🌐 - Streetview = Panorama (百度全景) - Bing Maps
🌐 - GeoNames
🌐 - Geographical database. - GoogleMaps
🌐 - Google Earth
🌐 - Google Earth Outreach - Advanced Google Earth tutorials. Example: Image & Photos Overlay
- Google Earth Engine - Datasets, case studies, etc.
- GEarth Blog - Resources & how-tos about Google Earth
- Satellite imagery providers:
- Copernicus Open Access Hub
🌐 - Free access to imagery from the European Sentinel satellites. - Descartes Labs
🌐 💲 - DigitalGlobe Discover
🌐 - Search for satellite imagery of a particular location. Ability to download images (low-resolution compared to Google Earth). - EOS Landviewer
🌐 💲 - NASA EarthData & EarthViewer
🌐 - USGS Earth Explorer
🌐 - NASA Landsat imagery
- Copernicus Open Access Hub
- Here WeGo
🌐 - SentinelHub
🌐 - Satellite imagery, historical data from several sources, vegetation infrared & index, image exports & comparison. 2 products:- Playground - Data discovery, playing around
- EO Browser - Compare full resolution images from several sources (Landsat, Sentinel), make time lapses & export to GIF (free signup required).
- See also the custom scripts to highlight fire, snow, metals, type of terrain, etc.
- Zoom Earth
🌐 - NASA satellite and aerial images of the Earth.
- Yandex Maps
🌐 - Has a "Streetview" feature.
Tools & techniques
- Geographic Bounding Box Drawing Tool
🌐 - Draw a rectangle over a map and get the coordinates of its points & center. - PeakFinder
🌐 - Show names of all mountains and peaks from any coordinates with a 360° panoramic mountain view. - Shadows and Angles: Measuring Object Heights from Satellite Imagery
📖 (GISLounge) - Shadows and Suncalc
📖 - Great tutorial on using Google Earth & Suncalc to calculate time based on shadows. - SunCalc
🌐 - Historical solar data (sun orientation & elevation, shadow length, etc). - TerraPattern
🌐 - Scan large geographical areas for specific visual features using machine learning. Only available for 7 cities.
User generated content
See also: Social Networks
- EchoSec
🌐 💲 - Search and analyze social media data based on location. ($499/mo) - GeoCreepy
💻 - Geolocation information gathering through social networking platforms (discontinued). - Kamerka
💻 - Create an interactive map of cameras, printers, tweets and photos based on your coordinates. - OpenStreetMap
🌐 - User generated locations & maps. Use taginfo and/or overpass-turbo.eu to search a location by key/value tags (see OSM's Wiki) - Mapillary
🌐 - Interactive map of crowdsourced geotagged photographs. - OpenStreetCam
🌐 - Map of crowdsourced street-level photographs. - Social networks (see category)
- Surveillance under Surveillance
🌐 - User-contributed map of cameras and guards. - Tourism & review websites: Foursquare, TripAdvisor, Yelp, etc.
🌐 - Vkontakte
🌐 - Usenear:<coordinates>
in a search. - Wikimapia
🌐 - User-generated locations & descriptions. Has an API.
Military/Weapons
- CalibreObscura
🌐 - A blog about weapons & their uses in Middle East conflicts. - CamoPedia
🌐 - Camouflage encyclopedia. Search & compare camouflage patterns. - ENAAT Data Browser
🌐 - Browse EU Arms Export Data. - How to Digitally Verify Combatant Affiliation in Middle East Conflicts
📖 (Bellingcat) - ICUS Camouflage Index
🌐 - International Encyclopedia of Uniform Insignia
🌐 - Investigating and Tracking the Global Arms Trade
📖 (Corruption Watch UK) - Good presentation, full of resources of all types on Arms Trade. - List of Comparative Military Ranks
🌐 (Wikipedia) - Omega Research Foundation's Identification & Documentation guides
📖 - Guides on identifying and documenting police & military equipment. - SEESAC Reports & Map
🌐 - Database of firearms-related incidents in South East Europe. - SIPRI Arms Transfers Database
🌐 - Information on all transfers of major conventional weapons from 1950 to the current year. - Sketchfab
🌐 - User-made 3D models sharing platform with lots of weapons. Useful to compare, check different angles, etc. - Small Arms Survey’s Weapon ID database
🌐 - Search for small arms by caliber, type, location, etc. - Small Arms Survey: Documenting Small Arms and Light Weapons
📖 - International policy recap & identification guide. - UN Comtrade Database
🌐 - Official international trade statistics, including arms trade. - UNROCA
🌐 - UN Register of Conventional Arms. Country-level data on arms exports. - World Army Pictures
🌐 - Pictures of armies from all over the world.
Multi-purpose tools
- Buscador
💻 - A very handy VM with plenty of pre-installed & pre-configured OSINT tools. - DataSploit
💻 - A collection of python scripts which automates open source intelligence searches about domain names, email addresses, IP addresses and usernames. - IntelligenceX Tools
🌐 - Various search, email and domain tools. - Maltego CE
💻 - Interactive data mining & mapping tool. - Spiderfoot
💻 - Open source intelligence automation tool. Gathers intelligence about a given target, which may be an IP address, domain name, hostname, network subnet, ASN, e-mail address or person's name.
News
- AllYouCanRead
📝 - Database of news outlets by country. - NewsLookup
🌐 - News search engine with useful filters. - NewsNow
🌐 - News search engine with useful filters. - NewspaperMap
🌐 - Newspapers world map with feeds and automatic translation.
Phone numbers
- NumberWay
🌐 - International directory of white pages and yellow pages phone books. - PhoneInfoga
💻 - Information gathering & OSINT reconnaissance tool for phone numbers. - Using Phone Contact Book Apps For Digital Research
📖 (Bellingcat)
Pictures, Photos, Videos
Pictures Metadata
- Exiftool
💻 - Read and edit metadata. Linode Tutorial - Exif Viewer (Firefox/Chrome)
💻 - FotoForensics
🌐 - Online pictures metadata viewer. - Ghiro
💻 - Automated image forensics tool. - Jeffrey's Image Metadata Viewer
🌐 - mat2
💻 - Metadata removal tool. - mat2-web
🌐 - Online version of mat2. - StolenCameraFinder
🌐 - Search the web for pictures with a specific camera serial number.
Reverse search
- Bing Images
🌐 - Can search part of an image by resizing on the fly. - CitizenEvidence
🌐 - Google Images reverse search on Youtube thumbnails. - EagleEye
💻 - Find Instagram, FB and Twitter profiles using image recognition and reverse image search. - Google Images
🌐 - Search by Image
💻 - Browser extension to quickly reverse-search an image on 20+ search engines. - TinEye
🌐 - Yandex Images
🌐
Search
- How to Conduct Comprehensive Video Collection (Bellingcat)
📖 - PimEyes
🌐 - Face-recognition matching search engine. - SearchFace.ru
🌐 - Face recognition search engine for the Russian VK social network. See this guide from Bellingcat for a tutorial. - SocialMapper
🌐 - Social Media Mapping Tool that correlates profiles via facial recognition. Supports LinkedIn, Facebook, Twitter, Instagram, VKontakte, Weibo, Douban.
Verification & Analysis
- Advanced Guide on Verifying Video Content
📖 (Bellingcat) - face_recognition
💻 🐍 - Command-line tool and python library for recognizing known faces on a batch of pictures. - How to verify photos and videos on social media networks
📖 (France24) - InVID Verification Plugin
💻 - Verification “Swiss army knife” Firefox extension. - Photo Verification Cheatsheet & Video Verification Cheatsheet
📖 (FirstDraftNews) - Verification 101
📖 - Storyful’s advice for checking out material from social media, and putting it into practice. - Verification Handbook
📖 - Handbook by the European Journalism Centre about verifying digital content in emergency coverage.
Social Networks
All/General
- EagleEye
💻 - Find Instagram, FB and Twitter profiles using image recognition and reverse image search. - HashAtIt
🌐 - Hashtag search across Twitter, Instagram, Pinterest, Facebook and Youtube. - Sherlock
💻 - Search for a username across 135 social media sites. - SocialMapper
🌐 - Social Media Mapping Tool that correlates profiles via facial recognition. Supports LinkedIn, Facebook, Twitter, Instagram, VKontakte, Weibo, Douban. - WhatsMyName
💻 - Search for usernames on 180+ web sites.
Discord
- dis.cool
🌐 - Discord search engine.
- fb-search
🌐 - Simple Graph query crafter. Made after Facebook sudden closure of Graph Search. - FFFF Finds Facebook Friends
💻 - Builds a relationship graph of a target user. Partially reconstructs hidden friend lists.🔥 .
Github
- gitrob
💻 - Find potentially sensitive files pushed to public repositories on Github. Requires a GitHub access token. - Zen
💻 - Find emails of Github users.
- instaloader
💻 - Download pictures (or videos) along with their captions and other metadata from Instagram. - instagram-scraper
💻 - Scrape a user's photos and videos. - searchmybio
🌐 - Search Instagram users biographies.
- An Investigative Guide To LinkedIn
📖 (Bellingcat) - CrossLinked
💻 - LinkedIn enumeration tool to extract valid employee names from an organization. - LinkedIn Operators Tip Sheet
📖 - raven
💻 - Linkedin information gathering tool. Extracts employee data for a given company. - The Endorser
💻 - Draw out relationships between people on LinkedIn via endorsements/skills.
- Reddit Comment Search
🌐 - Search through comments of a particular reddit user. - Reddit Insight
🌐 - Collect info on a Reddit profile, list all posts & comments. - Reddit Investigator
🌐 - Collect info on a Reddit profile. - Reddit Search
🌐 - Reddit search engine with filters. - ReSavr
🌐 - Search deleted comments.
Snapchat
Telegram
- Buzz.im
🌐 - Search in open telegram messages. - Lyzem
🌐 - Telegram search engine. - Telegago
🌐 - Google Custom Search Engine for Telegram users & content. Can discover private groups. - tlgrm.eu
🌐 - Search for Telegram channels. - tgstat.ru
🌐 - Telegram analytics & seach tool.
- DMI-TCAT
💻 - PHP web interface to retrieve and analyze tweets. - SocialBearing
🌐 - Statistics on keywords, hashtags, users. - SpoonBill
🌐 - Track changes in Twitter profiles & bios. Requires a Twitter account. - tinfoleak
💻 - Very complete open-source tool for Twitter intelligence analysis. Needs API credentials. - twarc
💻 🐍 - A command line tool and Python library for archiving Twitter in JSON format. - Tweetdeck
🌐 - Tweetdeck Location Search Tutorial
📖 - Tweet Map
🌐 - Explore the world and find geo-tagged tweets. - Tweets Analyzer
💻 - Twitter profile analyzer with tweet activity charts, locations, most used hashtags, etc. Can save tweets to JSON. Requires a Twitter API key. - tweetsmapper
💻 - Generates a Leaflet map for a given user or from an existing collection of tweets. Can retrieve full timelines. - TWINT (Twitter Intelligence Tool)
💻 - Advanced Twitter scraping tool, no API key needed. Can export to text, CSV, JSON, SQLite, Elasticsearch. Can detect emails, phone numbers, profiles. - Who Tweeted It First?
🌐 - Find out who was the first person who tweeted a link, video, quote or any piece of text.
VKontakte
- SnRadar
🌐 - Search VKontakte content by location.
Youtube
- Unlisted Videos
🌐 - Search & submit unlisted YouTube videos. No registration required.
Text & Documents
Documents metadata
- Apache Tika
💻 - Extract metadata and text from over a thousand different file types. - FOCA
🌐 💻 - Find metadata and hidden information in Microsoft Office, Open Office, or PDF files. - ICIJ Extract
💻 - A command line tool for parallelized, distributed content-extraction.
Indexing & searching
- Aleph
💻 - A toolkit for data search, management and analysis in investigative reporting. - Blacklight
💻 - Open source Solr user interface discovery platform. - Datashare
💻 - Index & search documents on your computer, automatically detect people, organizations and locations with NLP. - DumpsterDiver
💻 - Analyze big volumes of various file types in search of secrets, credentials, etc. - ICIJ Extract
💻 - A command line tool for parallelized, distributed content-extraction. - searchbox
💻 - A simple out-of-the-box web interface to search through thousands of unstructured documents using Solr.
OCR
- NewOCR.com
🌐 - Recognizes several languages. Can resize images & has shortcuts to Google & Bing Translate. - Tesseract
💻 - Open-source OCR engine.
- PDF Text Extraction with PyPDF2, Tika & PDF Miner.
💻 - tabula
💻 - Tool for liberating data tables trapped inside PDF files.
Text Processing & Analysis
- topia
🐍 - Python module to determine important terms within a given piece of content. - TXM
💻 - Lexicometry and text statistical analysis for large bodies of text.
Transportation
Containers & Shipments
- BIC Code Register
🌐 - Business Identifier Codes lookup. The website also has other search tools and useful information on container markings. - Prefix List
🌐 - Find the owner of a container from its prefix. - track-trace
🌐 - Track parcels/shipments, air cargo, containers and post.
Planes
- Flights tracking:
- FlightAware
🌐 - FlightRadar24
🌐 - PlaneFinder
🌐 - RadarBox
🌐
- FlightAware
- PlaneMapper
🌐 - Flights, airports, airlines and aircrafts databases.
Ships
- Inmarsat Ships Directory
🌐 - Find contact details from a ship's name or number. - Maritime Connector
🌐 - Maritime jobs listings & search. - Maritime Database
🌐 - Lists and details of shipping-related businesses and ports of the world. - Ship search & track:
Visualization
Graphs
- Data Visualisation Catalogue
📖 - Find which visualisation is right for what you want to show. Plenty of tips & resources. - DataWrapper
🌐 💲 - Easy to use graph & map tool. Free plan available. - Google Fusion Tables - Create maps & charts from data. Will shut down on Dec. 2019.
- Matplotlib
🐍 - Python 2D plotting library. Best used with pandas in a Jupyter notebook. - RawGraph
🌐 💻 - Generate static graphs through a very user-friendly interface. Can be run locally.
Maps
- ArcGIS
💻 💲 - Mapping & analysis software (proprietary, paid, 21-day trial) - Folium
🐍 - Python library to create Leaflet.js maps. Can be used in a Jupyter Notebook to map data from pandas. - Geopy
🐍 - Python geocoding library. Supports OSM Nominatim, Google, Bing, GeoNames & many more. - Google:
- MyMaps
🌐 - Earth
🌐 - Earth Pro
💻 - Earth Studio
🌐 💻
- MyMaps
- Humanitarian Data Exchange
🌐 - Useful resources of shapefiles, especially for administrative boundaries. - KML Interactive Sampler
🌐 - Lots of KML templates. - QGIS
💻 - Free & open-source alternative to ArcGis.
Mindmaps & Network graphs
- Draw.io
🌐 💻 - Open-source diagramming tool. Can be run locally. - Gephi
💻 - Powerful visualization and exploration software - Visual Investigative Scenarios
🌐 (OCCRP) - yEd Graph Editor
💻
Timelines
- Tik Tok
💻 - Javascript tool to easily create simple, mobile-friendly, vertical timelines. Open-source. - TimelineJS
💻
Weather
- timeanddate.com
🌐 - Weather history. - Ventusky
🌐 - Live & past wind, rain and temperature maps. - Wolfram Alpha
🌐 - Weather history. What was the weather in New York on January 1st 2017? - Wunderground History
🌐 - Weather history
Websites
See also: Archival
Dark Web & Onion services
- DarkSearch
🌐 - Dark web search engine. - OnionScan
💻 - OSINT Tools for the Dark Web (Jake Creps)
📝 - Presentation of several tools to help investigate the dark web.
Scraping
- Photon
💻 - Crawl a website (or its archive from the WayBack machine) and extract URLs, emails, social media accounts, files, keys, subdomains, etc. - Python scraping libraries:
- BeautifulSoup
🐍 - cloudflare-scrape
🐍 - Selenium
🐍 - Scrapy
🐍
- BeautifulSoup
- Scrape Interactive Geospatial Data
📖 (Bellingcat)
Searches, info, related entities
- Advanced Google searches
- Google Search Operators
📖 (moz.com) - Mastering Google Search Operators in 67 steps
📖 (moz.com) - Google Hacking Database
📖 (Exploit.db) - Google Search Operators: The Complete List
📖 (ahrefs.com)
- Google Search Operators
- CarbonDate
💻 - Estimate the age of web resources. Has an non-HTTPS online version - crt.sh
🌐 - Certificates search. - Domain_OSINT
📝 - Ph055a's list of tools to investigate domains & IoT devices. - DNSDumpster
🌐 - Domain research tool that can discover hosts related to a domain. - FinalRecon
💻 - All-in-one tool : whois, headers, SSL certificates details, image & links crawling. - NerdyData Search
🌐 - Source code search engine. - OpenLinkProfiler - Search & analyze the links of a website. Good replacement for Google's defunct
link:
operator. - PublicWWW
🌐 - Search the source code of pages. - pymeta
💻 - Find document files on a domain, download them and extract metadata. - SpyOnWeb
🌐 - Search by URL, IP address, analytics codes. API with free plan. See this Belligcat how-to for automation. - Sublist3r
💻 - Subdomains enumeration tool. - Unveiling hidden site connections with Google Analytics IDs
📖 (Bellingcat)
Misc
- awesome-selfhosted
📝 - A list of Free Software network services and web applications which can be hosted locally - grayhatwarfare
🌐 - Search open Amazon S3 buckets content. - Shodan
🌐 - Internet of Things search engine - World License Plates
🌐 - Pictures of license plates from all around the world.
License
This list is under the Creative Commons Attribution-NonCommercial 4.0 International Public License License.