PdpipeEasy pipelines for pandas DataFrames.
Stars: ✭ 590 (+5800%)
Sklearn ClassificationData Science Notebook on a Classification Task, using sklearn and Tensorflow.
Stars: ✭ 518 (+5080%)
Datacurator Filetreea standard filetree for /r/datacurator [ and r/datahoarder ]
Stars: ✭ 753 (+7430%)
VadVoice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
Stars: ✭ 622 (+6120%)
Udacity Data Engineering ProjectsFew projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+4480%)
AtscanAdvanced dork Search & Mass Exploit Scanner
Stars: ✭ 817 (+8070%)
Riceteacatpandarepo with challenge material for riceteacatpanda (2020)
Stars: ✭ 18 (+80%)
Machine Learning RoadmapA roadmap connecting many of the most important concepts in machine learning, how to learn them and what tools to use to perform them.
Stars: ✭ 5,277 (+52670%)
MetabaseThe simplest, fastest way to get business intelligence and analytics to everyone in your company 😋
Stars: ✭ 26,803 (+267930%)
PyjanitorClean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 647 (+6370%)
WebReact web interface for the OpenDota platform
Stars: ✭ 889 (+8790%)
Valid.js📝 A library for data validation.
Stars: ✭ 604 (+5940%)
GcamdataThe GCAM data system
Stars: ✭ 22 (+120%)
PaniniA super simple flat file generator.
Stars: ✭ 562 (+5520%)
Brasil.ioBackend do Brasil.IO (para código dos scripts de coleta de dados, veja o link na página de cada dataset)
Stars: ✭ 780 (+7700%)
Countly ServerCountly helps you get insights from your application. Available self-hosted or on private cloud.
Stars: ✭ 4,857 (+48470%)
AgotsAnomaly Generator on Time Series
Stars: ✭ 24 (+140%)
Voice datasets🔊 A comprehensive list of open-source datasets for voice and sound computing (50+ datasets).
Stars: ✭ 494 (+4840%)
TerriajsA library for building rich, web-based geospatial data platforms.
Stars: ✭ 699 (+6890%)
Data Engineering BookAccumulated knowledge and experience in the field of Data Engineering
Stars: ✭ 471 (+4610%)
Flight Prices ScraperAutomated Script to scrape flight prices from any website into a csv format
Stars: ✭ 17 (+70%)
FetchSimple & Efficient data access for Scala and Scala.js
Stars: ✭ 453 (+4430%)
SnowplowThe enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
Stars: ✭ 5,935 (+59250%)
Octo CliCLI tool to expose data from any database as a serverless web service.
Stars: ✭ 653 (+6430%)
Isp Data PollutionISP Data Pollution to Protect Private Browsing History with Obfuscation
Stars: ✭ 425 (+4150%)
BitsA bite sized library for dealing with bytes.
Stars: ✭ 16 (+60%)
Fsharp.dataF# Data: Library for Data Access
Stars: ✭ 631 (+6210%)
Poetry非常全的古诗词数据,收录了从先秦到现代的共计85万余首古诗词。
Stars: ✭ 920 (+9100%)
DatafusionDataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (+6010%)
Awesome Ai Ml DlAwesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.
Stars: ✭ 831 (+8210%)
DatasheetsRead data from, write data to, and modify the formatting of Google Sheets
Stars: ✭ 593 (+5830%)
Dendro"Open-source Dropbox" with added description features. It is a data storage and description platform designed to help researchers and other users to describe their data files, built on Linked Open Data and ontologies. Users can use Dendro to publish data to CKAN, Zenodo, DSpace or EUDAT's B2Share and others.
Stars: ✭ 25 (+150%)
IexfinancePython SDK for IEX Cloud
Stars: ✭ 573 (+5630%)
Sensei GridSimple and lightweight data grid in JS/HTML
Stars: ✭ 808 (+7980%)
Machine Learning MindmapA mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.
Stars: ✭ 5,339 (+53290%)
LpfmpointsEvolution of LPFM Stations
Stars: ✭ 19 (+90%)
FootballdataA hodgepodge of JSON and CSV Football/Soccer data
Stars: ✭ 526 (+5160%)
Awesome StreamlitThe purpose of this project is to share knowledge on how awesome Streamlit is and can be
Stars: ✭ 769 (+7590%)
Disk.frameFast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
Stars: ✭ 517 (+5070%)
DatabookA facebook for data
Stars: ✭ 26 (+160%)
Knowledge RepoA next-generation curated knowledge sharing platform for data scientists and other technical professions.
Stars: ✭ 4,956 (+49460%)
RowsA common, beautiful interface to tabular data, no matter the format
Stars: ✭ 739 (+7290%)
PybaseballPull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
Stars: ✭ 484 (+4740%)
Mithril DataA rich data model library for Mithril javascript framework
Stars: ✭ 17 (+70%)
Core2dA multi-platform data driven 2D diagram editor.
Stars: ✭ 475 (+4650%)
Sheetjs📗 SheetJS Community Edition -- Spreadsheet Data Toolkit
Stars: ✭ 28,479 (+284690%)
RioA Swiss-Army Knife for Data I/O
Stars: ✭ 467 (+4570%)
Pytest PatternsA couple of examples showing how pytest and its plugins can be combined to solve real-world needs.
Stars: ✭ 24 (+140%)
Bogus📇 A simple and sane fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.
Stars: ✭ 5,083 (+50730%)
TensorbaseTensorBase BE is building a high performance, cloud neutral bigdata warehouse for SMEs fully in Rust.
Stars: ✭ 440 (+4300%)
Z1pZip Codes Validation and Parse.
Stars: ✭ 17 (+70%)
McwMicrosoft Cloud Workshop Project
Stars: ✭ 677 (+6670%)
GraphGraph is a semantic database that is used to create data-driven applications.
Stars: ✭ 855 (+8450%)
ModelassistantElegant library to manage the interactions between view and model in Swift
Stars: ✭ 26 (+160%)
DztalkappDelphi non-visual component to communicate between applications
Stars: ✭ 23 (+130%)
SkdataPython tools for data analysis
Stars: ✭ 16 (+60%)
FakerFaker is a pure Elixir library for generating fake data.
Stars: ✭ 673 (+6630%)