nutterTesting framework for Databricks notebooks
Stars: ✭ 152 (+280%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+97.5%)
Getting StartedThis repository is a getting started guide to Singer.
Stars: ✭ 734 (+1735%)
bandar-logMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 20 (-50%)
dflibIn-memory Java DataFrame library
Stars: ✭ 50 (+25%)
blackbricksBlack for Databricks notebooks
Stars: ✭ 40 (+0%)
Ether sqlA python library to push ethereum blockchain data into an sql database.
Stars: ✭ 41 (+2.5%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+4202.5%)
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (-52.5%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+12197.5%)
DatacleanerThe premier open source Data Quality solution
Stars: ✭ 391 (+877.5%)
ElandPython Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (+487.5%)
whyqddata wrangling simplicity, complete audit transparency, and at speed
Stars: ✭ 16 (-60%)
copulaeMultivariate data modelling with Copulas in Python
Stars: ✭ 96 (+140%)
website-oldThe Frictionless Data website.
Stars: ✭ 31 (-22.5%)
csvpluscsvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (+67.5%)
transbigdataA Python package develop for transportation spatio-temporal big data processing, analysis and visualization.
Stars: ✭ 195 (+387.5%)
DQCS数据质量控制系统
Stars: ✭ 34 (-15%)
plottrA flexible plotting and data analysis tool.
Stars: ✭ 32 (-20%)
UliEngineeringA python library for calculations perfomed in electronics engineering
Stars: ✭ 35 (-12.5%)
Data-Science-101Notes and tutorials on how to use python, pandas, seaborn, numpy, matplotlib, scipy for data science.
Stars: ✭ 19 (-52.5%)
labplotLabPlot is a FREE, open source and cross-platform Data Visualization and Analysis software accessible to everyone.
Stars: ✭ 107 (+167.5%)
OpenKettleWebUI一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Stars: ✭ 138 (+245%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+1430%)
databricks-dbapiDBAPI and SQLAlchemy dialect for Databricks Workspace and SQL Analytics clusters
Stars: ✭ 21 (-47.5%)
antzANTz immersive 3D data visualization engine
Stars: ✭ 25 (-37.5%)
dask-awkwardNative Dask collection for awkward arrays, and the library to use it.
Stars: ✭ 25 (-37.5%)
mlops-platformsCompare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...
Stars: ✭ 293 (+632.5%)
nebulaA distributed block-based data storage and compute engine
Stars: ✭ 127 (+217.5%)
kobe-every-shot-everA Los Angeles Times analysis of Every shot in Kobe Bryant's NBA career
Stars: ✭ 66 (+65%)
python-notebooksA collection of Jupyter Notebooks used in conferences or just to have some snippets.
Stars: ✭ 14 (-65%)
PracticalMachineLearningA collection of ML related stuff including notebooks, codes and a curated list of various useful resources such as books and softwares. Almost everything mentioned here is free (as speech not free food) or open-source.
Stars: ✭ 60 (+50%)
dsrIntroduction to Data Science with R (2017)
Stars: ✭ 25 (-37.5%)
spectrochempySpectroChemPy is a framework for processing, analyzing and modeling spectroscopic data for chemistry with Python
Stars: ✭ 34 (-15%)
waspWASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-52.5%)
Bitcoin Analysis-Python Bitcoin is widely used cryptocurrency for digital market. It is decentralised that means it is not own by government or any other company.Transactions are simple and easy as it doesn’t belong to any country.Records data are stored in Blockchain.Bitcoin price is variable and it is widely used so it is important to predict the price of it f…
Stars: ✭ 42 (+5%)
datajoint-pythonRelational data pipelines for the science lab
Stars: ✭ 140 (+250%)
ipaddressData analysis of IP addresses and networks
Stars: ✭ 20 (-50%)
CVparserCVparser is software for parsing or extracting data out of CV/resumes.
Stars: ✭ 28 (-30%)
uptasticsearchAn Elasticsearch client tailored to data science workflows.
Stars: ✭ 47 (+17.5%)
xxhadoopData Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
Stars: ✭ 37 (-7.5%)
sql-to-redis🔄 Simple tool for ETL. From SQL to Redis.
Stars: ✭ 18 (-55%)
DatscanDatScan is an initiative to build an open-source CMS that will have the capability to solve any problem using data Analysis just with the help of various modules and a vast standardized module library
Stars: ✭ 13 (-67.5%)
django-calaccess-raw-dataA Django app to download, extract and load campaign finance and lobbying activity data from the California Secretary of State's CAL-ACCESS database
Stars: ✭ 61 (+52.5%)
flockFlock: A Low-Cost Streaming Query Engine on FaaS Platforms
Stars: ✭ 232 (+480%)
heidiheidi : tidy data in Haskell
Stars: ✭ 24 (-40%)
sync-engine-exampleSynchronization Algorithm Exploration: Techniques to synchronize a SQL database with external destinations.
Stars: ✭ 17 (-57.5%)
dogETLA lib to transform data from jdbc,csv,json to ecah other.
Stars: ✭ 15 (-62.5%)
snorkelSnorkel - Bootstrap your Data Science
Stars: ✭ 24 (-40%)
CC33ZCurso de Ciência da Computação
Stars: ✭ 50 (+25%)
blockchain-etl-streamingStreaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Stars: ✭ 57 (+42.5%)