awesome-bigdataA curated list of awesome big data frameworks, ressources and other awesomeness.
fakenewsdata1This repository contains two independent news datasets used in the 2017 study: "This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News"
fakerGenerate massive amounts of fake data in the browser and node.js
sql-to-mongodbA Node.js script to convert an SQL table to a MongoDB database.
kartDistributed version-control for geospatial and tabular data
ipychartThe power of Chart.js with Python
sketch-data-fakerA Sketch plugin providing 130+ types of smart placeholder content for your mockups from Faker.js and other sources.
saddleSADDLE: Scala Data Library
mutableState containers with dirty checking and more
ccu-historianDer CCU-Historian erfasst die Betriebsdaten des Hausautomations-Systems HomeMatic der Firma eQ-3.
pypelyMake your data processing easy
datasetsThe primary repository for all of the CORGIS Datasets
copulaeMultivariate data modelling with Copulas in Python
irsyncrsync on interval, via command line binary or docker container. Server and IOT builds for pull or push based device content management.
neissData from National Electronic Injury Surveillance System
harlanHarlan é o sistema modular que permite você automatizar toda sua governança cadastral da nuvem.
goseederGo database seeder inspired from Laravel/Lumen seeder and more
pyjanitorClean APIs for data cleaning. Python implementation of R package Janitor
mysql-random-data-generatorThis is the easiest MySQL random test data generator tool. Load the procedure and execute to auto detect column types and load data.
COVID19TweetWNUT-2020 Task 2: Identification of informative COVID-19 English Tweets
cloud-data-analysis-at-scale[Course-2020-2022] taught at Duke MIDS. This is also a Coursera Course that covers MLOps, ML Engineering and the foundations of Cloud Computing for Data Science.
pgsinkLogically replicate data out of Postgres into sinks (files, Google BigQuery, etc)
machine learningA gentle introduction to machine learning: data handling, linear regression, naive bayes, clustering
farolcovid🚦🏥. Ferramenta de monitoramento do risco de colapso no sistema de saúde em municípios brasileiros com a Covid-19 • Monitoring tool & simulation of the risk of collapse in Brazilian municipalities' health system due to Covid-19
data-mediatora data mediator framework bind callbacks for any property
rockhoundNOTICE: This library is no longer being developed. Use Ensaio instead (https://www.fatiando.org/ensaio). -- Download geophysical models/datasets and load them in Python
paperComputer Foundations Practices
geoflowR engine to orchestrate and run (meta)data workflows
simpleopendatasimple guidelines for publishing open data in useful formats
rastercuberastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Data-ExportData-Export支持将链上数据导出到MySQL、ES等便于进行大数据处理的存储介质中,解决区块链数据复杂查询、分析、可视化和处理的问题。
zpySynthetic data for computer vision. An open source toolkit using Blender and Python.
eventsMaterials related to events I might attend, and to talks I am giving
xfinity-data-usageFetch Xfinity data usage and serve it via an HTTP endpoint, publish it to MQTT or post it to an URL.
loreWebGL engine for (big) data visualization.
widgetsWidgets for blockchain data visualizations
godmtTool that can parse Go files into an abstract syntax tree and translate it to several programming languages.
ESSEEncrypted peer-to-peer system for data security. Own data, own privacy. (Rust+Flutter)
knime-rKNIME Interactive R Statistics Integration