All Projects → architect_big_data_solutions_with_spark → Similar Projects or Alternatives

665 Open source projects that are alternatives of or similar to architect_big_data_solutions_with_spark

Etl unicorn
数据可视化, 数据挖掘, 数据处理 ETL
Stars: ✭ 156 (+290%)
Mutual labels:  etl, data-analysis
nutter
Testing framework for Databricks notebooks
Stars: ✭ 152 (+280%)
Mutual labels:  databricks, databricks-notebooks
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+97.5%)
Mutual labels:  etl, data-analysis
Getting Started
This repository is a getting started guide to Singer.
Stars: ✭ 734 (+1735%)
Mutual labels:  etl, data-analysis
bandar-log
Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 20 (-50%)
Mutual labels:  etl, spark-streaming
dflib
In-memory Java DataFrame library
Stars: ✭ 50 (+25%)
Mutual labels:  etl, data-analysis
Awesome Business Intelligence
Actively curated list of awesome BI tools. PRs welcome!
Stars: ✭ 1,157 (+2792.5%)
Mutual labels:  etl, data-analysis
blackbricks
Black for Databricks notebooks
Stars: ✭ 40 (+0%)
Mutual labels:  databricks, databricks-notebooks
Ether sql
A python library to push ethereum blockchain data into an sql database.
Stars: ✭ 41 (+2.5%)
Mutual labels:  etl, data-analysis
Spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+4202.5%)
Mutual labels:  spark-streaming, databricks
Bandar Log
Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (-52.5%)
Mutual labels:  etl, spark-streaming
Airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+12197.5%)
Mutual labels:  etl, data-analysis
Datacleaner
The premier open source Data Quality solution
Stars: ✭ 391 (+877.5%)
Mutual labels:  etl, data-analysis
Eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (+487.5%)
Mutual labels:  etl, data-analysis
dbt-databricks
A dbt adapter for Databricks.
Stars: ✭ 115 (+187.5%)
Mutual labels:  etl, databricks
whyqd
data wrangling simplicity, complete audit transparency, and at speed
Stars: ✭ 16 (-60%)
Mutual labels:  data-analysis
copulae
Multivariate data modelling with Copulas in Python
Stars: ✭ 96 (+140%)
Mutual labels:  data-analysis
website-old
The Frictionless Data website.
Stars: ✭ 31 (-22.5%)
Mutual labels:  data-analysis
csvplus
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (+67.5%)
Mutual labels:  etl
transbigdata
A Python package develop for transportation spatio-temporal big data processing, analysis and visualization.
Stars: ✭ 195 (+387.5%)
Mutual labels:  data-analysis
DQCS
数据质量控制系统
Stars: ✭ 34 (-15%)
Mutual labels:  etl
plottr
A flexible plotting and data analysis tool.
Stars: ✭ 32 (-20%)
Mutual labels:  data-analysis
UliEngineering
A python library for calculations perfomed in electronics engineering
Stars: ✭ 35 (-12.5%)
Mutual labels:  data-analysis
Data-Science-101
Notes and tutorials on how to use python, pandas, seaborn, numpy, matplotlib, scipy for data science.
Stars: ✭ 19 (-52.5%)
Mutual labels:  data-analysis
python-data-visualization
Curated Python Notebooks for Data Visualization
Stars: ✭ 22 (-45%)
Mutual labels:  data-analysis
labplot
LabPlot is a FREE, open source and cross-platform Data Visualization and Analysis software accessible to everyone.
Stars: ✭ 107 (+167.5%)
Mutual labels:  data-analysis
OpenKettleWebUI
一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Stars: ✭ 138 (+245%)
Mutual labels:  etl
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+1430%)
Mutual labels:  etl
databricks-dbapi
DBAPI and SQLAlchemy dialect for Databricks Workspace and SQL Analytics clusters
Stars: ✭ 21 (-47.5%)
Mutual labels:  databricks
antz
ANTz immersive 3D data visualization engine
Stars: ✭ 25 (-37.5%)
Mutual labels:  data-analysis
dask-awkward
Native Dask collection for awkward arrays, and the library to use it.
Stars: ✭ 25 (-37.5%)
Mutual labels:  data-analysis
mlops-platforms
Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...
Stars: ✭ 293 (+632.5%)
Mutual labels:  databricks
nebula
A distributed block-based data storage and compute engine
Stars: ✭ 127 (+217.5%)
Mutual labels:  data-analysis
kobe-every-shot-ever
A Los Angeles Times analysis of Every shot in Kobe Bryant's NBA career
Stars: ✭ 66 (+65%)
Mutual labels:  data-analysis
python-notebooks
A collection of Jupyter Notebooks used in conferences or just to have some snippets.
Stars: ✭ 14 (-65%)
Mutual labels:  data-analysis
PracticalMachineLearning
A collection of ML related stuff including notebooks, codes and a curated list of various useful resources such as books and softwares. Almost everything mentioned here is free (as speech not free food) or open-source.
Stars: ✭ 60 (+50%)
Mutual labels:  data-analysis
dsr
Introduction to Data Science with R (2017)
Stars: ✭ 25 (-37.5%)
Mutual labels:  data-analysis
python mozetl
ETL jobs for Firefox Telemetry
Stars: ✭ 25 (-37.5%)
Mutual labels:  etl
spectrochempy
SpectroChemPy is a framework for processing, analyzing and modeling spectroscopic data for chemistry with Python
Stars: ✭ 34 (-15%)
Mutual labels:  data-analysis
Udacity-Data-Analyst-Nanodegree
Repository for the projects needed to complete the Data Analyst Nanodegree.
Stars: ✭ 31 (-22.5%)
Mutual labels:  data-analysis
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-52.5%)
Mutual labels:  spark-streaming
Bitcoin Analysis-
Python Bitcoin is widely used cryptocurrency for digital market. It is decentralised that means it is not own by government or any other company.Transactions are simple and easy as it doesn’t belong to any country.Records data are stored in Blockchain.Bitcoin price is variable and it is widely used so it is important to predict the price of it f…
Stars: ✭ 42 (+5%)
Mutual labels:  data-analysis
nasdaq-symbols
ETL for the NASDAQ symbol file
Stars: ✭ 13 (-67.5%)
Mutual labels:  etl
datajoint-python
Relational data pipelines for the science lab
Stars: ✭ 140 (+250%)
Mutual labels:  data-analysis
ipaddress
Data analysis of IP addresses and networks
Stars: ✭ 20 (-50%)
Mutual labels:  data-analysis
CVparser
CVparser is software for parsing or extracting data out of CV/resumes.
Stars: ✭ 28 (-30%)
Mutual labels:  etl
Data-Science-Resources
A guide to getting started with Data Science and ML.
Stars: ✭ 17 (-57.5%)
Mutual labels:  data-analysis
uptasticsearch
An Elasticsearch client tailored to data science workflows.
Stars: ✭ 47 (+17.5%)
Mutual labels:  etl
xxhadoop
Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
Stars: ✭ 37 (-7.5%)
Mutual labels:  spark-streaming
sql-to-redis
🔄 Simple tool for ETL. From SQL to Redis.
Stars: ✭ 18 (-55%)
Mutual labels:  etl
Datscan
DatScan is an initiative to build an open-source CMS that will have the capability to solve any problem using data Analysis just with the help of various modules and a vast standardized module library
Stars: ✭ 13 (-67.5%)
Mutual labels:  data-analysis
django-calaccess-raw-data
A Django app to download, extract and load campaign finance and lobbying activity data from the California Secretary of State's CAL-ACCESS database
Stars: ✭ 61 (+52.5%)
Mutual labels:  etl
flock
Flock: A Low-Cost Streaming Query Engine on FaaS Platforms
Stars: ✭ 232 (+480%)
Mutual labels:  etl
heidi
heidi : tidy data in Haskell
Stars: ✭ 24 (-40%)
Mutual labels:  data-analysis
sync-engine-example
Synchronization Algorithm Exploration: Techniques to synchronize a SQL database with external destinations.
Stars: ✭ 17 (-57.5%)
Mutual labels:  etl
BigInsights-on-Apache-Hadoop
Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix
Stars: ✭ 21 (-47.5%)
Mutual labels:  spark-streaming
dogETL
A lib to transform data from jdbc,csv,json to ecah other.
Stars: ✭ 15 (-62.5%)
Mutual labels:  etl
snorkel
Snorkel - Bootstrap your Data Science
Stars: ✭ 24 (-40%)
Mutual labels:  data-science-notebook
CC33Z
Curso de Ciência da Computação
Stars: ✭ 50 (+25%)
Mutual labels:  data-analysis
blockchain-etl-streaming
Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Stars: ✭ 57 (+42.5%)
Mutual labels:  etl
1-60 of 665 similar projects