All Projects → Dataspherestudio → Similar Projects or Alternatives

1414 Open source projects that are alternatives of or similar to Dataspherestudio

Argo Workflows
Workflow engine for Kubernetes
Stars: ✭ 10,024 (+738.83%)
Mutual labels:  airflow, workflow
hive to es
同步Hive数据仓库数据到Elasticsearch的小工具
Stars: ✭ 21 (-98.24%)
Mutual labels:  hive, hadoop
logparser
Easy parsing of Apache HTTPD and NGINX access logs with Java, Hadoop, Hive, Pig, Flink, Beam, Storm, Drill, ...
Stars: ✭ 139 (-88.37%)
Mutual labels:  hive, flink
the-apache-ignite-book
All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
Stars: ✭ 65 (-94.56%)
Mutual labels:  hive, hadoop
prosto
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (-95.48%)
Mutual labels:  workflow, spark
BigInsights-on-Apache-Hadoop
Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix
Stars: ✭ 21 (-98.24%)
Mutual labels:  hive, hadoop
polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-95.56%)
Mutual labels:  airflow, etl
cobra-policytool
Manage Apache Atlas and Ranger configuration for your Hadoop environment.
Stars: ✭ 16 (-98.66%)
Mutual labels:  hive, hadoop
spark-acid
ACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (-92.38%)
Mutual labels:  spark, hive
ibis
IBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.
Stars: ✭ 48 (-95.98%)
Mutual labels:  workflow, hadoop
qwery
A SQL-like language for performing ETL transformations.
Stars: ✭ 28 (-97.66%)
Mutual labels:  hive, etl
Around Dataengineering
A Data Engineering & Machine Learning Knowledge Hub
Stars: ✭ 257 (-78.49%)
Mutual labels:  airflow, spark
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-96.74%)
Mutual labels:  hadoop, etl
Pulsar Spark
When Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-95.4%)
Mutual labels:  spark, flink
Aws Ecs Airflow
Run Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
Stars: ✭ 107 (-91.05%)
Mutual labels:  airflow, etl
GooglePlay-Web-Crawler
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
Stars: ✭ 18 (-98.49%)
Mutual labels:  hive, hadoop
TitanDataOperationSystem
最好的大数据项目。《Titan数据运营系统》,本项目是一个全栈闭环系统,我们有用作数据可视化的web系统,然后用flume-kafaka-flume进行日志的读取,在hive设计数仓,编写spark代码进行数仓表之间的转化以及ads层表到mysql的迁移,使用azkaban进行定时任务的调度,使用技术:Java/Scala语言,Hadoop、Spark、Hive、Kafka、Flume、Azkaban、SpringBoot,Bootstrap, Echart等;
Stars: ✭ 62 (-94.81%)
Mutual labels:  hive, hadoop
astro
Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (-93.39%)
Mutual labels:  airflow, etl
EngineeringTeam
와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.
Stars: ✭ 41 (-96.57%)
Mutual labels:  hive, hadoop
openverse-catalog
Identifies and collects data on cc-licensed content across web crawl data and public apis.
Stars: ✭ 27 (-97.74%)
Mutual labels:  airflow, spark
Spline
Data Lineage Tracking And Visualization Solution
Stars: ✭ 306 (-74.39%)
Mutual labels:  spark, hadoop
Ytk Learn
Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Stars: ✭ 337 (-71.8%)
Mutual labels:  spark, hadoop
Kyuubi
Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (-69.62%)
Mutual labels:  spark, hive
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-98.91%)
Mutual labels:  spark, hadoop
bitnami-docker-airflow-scheduler
Bitnami Docker Image for Apache Airflow Scheduler
Stars: ✭ 19 (-98.41%)
Mutual labels:  workflow, airflow
bigkube
Minikube for big data with Scala and Spark
Stars: ✭ 16 (-98.66%)
Mutual labels:  airflow, spark
AirflowDataPipeline
Example of an ETL Pipeline using Airflow
Stars: ✭ 24 (-97.99%)
Mutual labels:  airflow, etl
Bitnami Docker Airflow
Bitnami Docker Image for Apache Airflow
Stars: ✭ 89 (-92.55%)
Mutual labels:  airflow, workflow
Iceberg
Iceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (-67.11%)
Mutual labels:  spark, hadoop
Docker Airflow
Docker Apache Airflow
Stars: ✭ 3,375 (+182.43%)
Mutual labels:  airflow, workflow
Elasticluster
Create clusters of VMs on the cloud and configure them with Ansible.
Stars: ✭ 298 (-75.06%)
Mutual labels:  spark, hadoop
Metorikku
A simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-69.79%)
Mutual labels:  spark, etl
Dagster
An orchestration platform for the development, production, and observation of data assets.
Stars: ✭ 4,099 (+243.01%)
Mutual labels:  etl, workflow
Bigdl
Building Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+219.08%)
Mutual labels:  spark, hadoop
Hive
Apache Hive
Stars: ✭ 4,031 (+237.32%)
Mutual labels:  hadoop, hive
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+283.35%)
Mutual labels:  hadoop, hive
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (-66.03%)
Mutual labels:  spark, hadoop
Big data architect skills
一个大数据架构师应该掌握的技能
Stars: ✭ 400 (-66.53%)
Mutual labels:  spark, hadoop
Hive Funnel Udf
Hive UDFs for funnel analysis
Stars: ✭ 72 (-93.97%)
Mutual labels:  hadoop, hive
Moonbox
Moonbox is a DVtaaS (Data Virtualization as a Service) Platform
Stars: ✭ 424 (-64.52%)
Mutual labels:  spark, hive
Featran
A Scala feature transformation library for data science and machine learning
Stars: ✭ 420 (-64.85%)
Mutual labels:  spark, flink
Yanagishima
Web UI for Trino, Presto, Hive, Elasticsearch, SparkSQL
Stars: ✭ 424 (-64.52%)
Mutual labels:  spark, hive
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-65.44%)
Mutual labels:  airflow, spark
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+1745.02%)
Mutual labels:  spark, hadoop
Airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Stars: ✭ 24,101 (+1916.82%)
Mutual labels:  airflow, workflow
Pdf
编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
Stars: ✭ 12,009 (+904.94%)
Mutual labels:  spark, hadoop
Yauaa
Yet Another UserAgent Analyzer
Stars: ✭ 472 (-60.5%)
Mutual labels:  flink, hive
Waimak
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-94.98%)
Mutual labels:  spark, hadoop
Cloudflow
Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: ✭ 278 (-76.74%)
Mutual labels:  spark, flink
Marmaray
Generic Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (-65.36%)
Mutual labels:  spark, hadoop
Bigdata
💎🔥大数据学习笔记
Stars: ✭ 488 (-59.16%)
Mutual labels:  hadoop, hive
Sparta
Real Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (-57.07%)
Mutual labels:  spark, workflow
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (-47.03%)
Mutual labels:  spark, etl
Useractionanalyzeplatform
电商用户行为分析大数据平台
Stars: ✭ 645 (-46.03%)
Mutual labels:  spark, hadoop
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+373.31%)
Mutual labels:  spark, hadoop
Goodreads etl pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (-33.64%)
Mutual labels:  airflow, spark
Databook
A facebook for data
Stars: ✭ 26 (-97.82%)
Mutual labels:  airflow, hive
Kylo
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Stars: ✭ 916 (-23.35%)
Mutual labels:  spark, hadoop
Dockerfiles
50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (-29.12%)
Mutual labels:  spark, hadoop
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+361.34%)
Mutual labels:  spark, flink
61-120 of 1414 similar projects