All Projects → Dataspherestudio → Similar Projects or Alternatives

1414 Open source projects that are alternatives of or similar to Dataspherestudio

Argo Workflows

Workflow engine for Kubernetes

Stars: ✭ 10,024 (+738.83%)

Mutual labels: airflow, workflow

hive to es

同步Hive数据仓库数据到Elasticsearch的小工具

Stars: ✭ 21 (-98.24%)

Mutual labels: hive, hadoop

logparser

Easy parsing of Apache HTTPD and NGINX access logs with Java, Hadoop, Hive, Pig, Flink, Beam, Storm, Drill, ...

Stars: ✭ 139 (-88.37%)

Mutual labels: hive, flink

the-apache-ignite-book

All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above

Stars: ✭ 65 (-94.56%)

Mutual labels: hive, hadoop

prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Stars: ✭ 54 (-95.48%)

Mutual labels: workflow, spark

BigInsights-on-Apache-Hadoop

Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix

Stars: ✭ 21 (-98.24%)

Mutual labels: hive, hadoop

polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

Stars: ✭ 53 (-95.56%)

Mutual labels: airflow, etl

cobra-policytool

Manage Apache Atlas and Ranger configuration for your Hadoop environment.

Stars: ✭ 16 (-98.66%)

Mutual labels: hive, hadoop

spark-acid

ACID Data Source for Apache Spark based on Hive ACID

Stars: ✭ 91 (-92.38%)

Mutual labels: spark, hive

ibis

IBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.

Stars: ✭ 48 (-95.98%)

Mutual labels: workflow, hadoop

qwery

A SQL-like language for performing ETL transformations.

Stars: ✭ 28 (-97.66%)

Mutual labels: hive, etl

Around Dataengineering

A Data Engineering & Machine Learning Knowledge Hub

Stars: ✭ 257 (-78.49%)

Mutual labels: airflow, spark

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (-96.74%)

Mutual labels: hadoop, etl

Pulsar Spark

When Apache Pulsar meets Apache Spark

Stars: ✭ 55 (-95.4%)

Mutual labels: spark, flink

Aws Ecs Airflow

Run Airflow in AWS ECS(Elastic Container Service) using Fargate tasks

Stars: ✭ 107 (-91.05%)

Mutual labels: airflow, etl

GooglePlay-Web-Crawler

Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive

Stars: ✭ 18 (-98.49%)

Mutual labels: hive, hadoop

TitanDataOperationSystem

最好的大数据项目。《Titan数据运营系统》，本项目是一个全栈闭环系统，我们有用作数据可视化的web系统，然后用flume-kafaka-flume进行日志的读取，在hive设计数仓，编写spark代码进行数仓表之间的转化以及ads层表到mysql的迁移，使用azkaban进行定时任务的调度，使用技术：Java/Scala语言，Hadoop、Spark、Hive、Kafka、Flume、Azkaban、SpringBoot，Bootstrap， Echart等；

Stars: ✭ 62 (-94.81%)

Mutual labels: hive, hadoop

astro

Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

Stars: ✭ 79 (-93.39%)

Mutual labels: airflow, etl

EngineeringTeam

와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.

Stars: ✭ 41 (-96.57%)

Mutual labels: hive, hadoop

openverse-catalog

Identifies and collects data on cc-licensed content across web crawl data and public apis.

Stars: ✭ 27 (-97.74%)

Mutual labels: airflow, spark

Spline

Data Lineage Tracking And Visualization Solution

Stars: ✭ 306 (-74.39%)

Mutual labels: spark, hadoop

Ytk Learn

Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).

Stars: ✭ 337 (-71.8%)

Mutual labels: spark, hadoop

Kyuubi

Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark

Stars: ✭ 363 (-69.62%)

Mutual labels: spark, hive

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-98.91%)

Mutual labels: spark, hadoop

bitnami-docker-airflow-scheduler

Bitnami Docker Image for Apache Airflow Scheduler

Stars: ✭ 19 (-98.41%)

Mutual labels: workflow, airflow

bigkube

Minikube for big data with Scala and Spark

Stars: ✭ 16 (-98.66%)

Mutual labels: airflow, spark

AirflowDataPipeline

Example of an ETL Pipeline using Airflow

Stars: ✭ 24 (-97.99%)

Mutual labels: airflow, etl

Bitnami Docker Airflow

Bitnami Docker Image for Apache Airflow

Stars: ✭ 89 (-92.55%)

Mutual labels: airflow, workflow

Iceberg

Iceberg is a table format for large, slow-moving tabular data

Stars: ✭ 393 (-67.11%)

Mutual labels: spark, hadoop

Docker Airflow

Docker Apache Airflow

Stars: ✭ 3,375 (+182.43%)

Mutual labels: airflow, workflow

Elasticluster

Create clusters of VMs on the cloud and configure them with Ansible.

Stars: ✭ 298 (-75.06%)

Mutual labels: spark, hadoop

Metorikku

A simplified, lightweight ETL Framework based on Apache Spark

Stars: ✭ 361 (-69.79%)

Mutual labels: spark, etl

Dagster

An orchestration platform for the development, production, and observation of data assets.

Stars: ✭ 4,099 (+243.01%)

Mutual labels: etl, workflow

Bigdl

Building Large-Scale AI Applications for Distributed Big Data

Stars: ✭ 3,813 (+219.08%)

Mutual labels: spark, hadoop

Hive

Apache Hive

Stars: ✭ 4,031 (+237.32%)

Mutual labels: hadoop, hive

Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (+283.35%)

Mutual labels: hadoop, hive

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (-66.03%)

Mutual labels: spark, hadoop

Big data architect skills

一个大数据架构师应该掌握的技能

Stars: ✭ 400 (-66.53%)

Mutual labels: spark, hadoop

Hive Funnel Udf

Hive UDFs for funnel analysis

Stars: ✭ 72 (-93.97%)

Mutual labels: hadoop, hive

Moonbox

Moonbox is a DVtaaS (Data Virtualization as a Service) Platform

Stars: ✭ 424 (-64.52%)

Mutual labels: spark, hive

Featran

A Scala feature transformation library for data science and machine learning

Stars: ✭ 420 (-64.85%)

Mutual labels: spark, flink

Yanagishima

Web UI for Trino, Presto, Hive, Elasticsearch, SparkSQL

Stars: ✭ 424 (-64.52%)

Mutual labels: spark, hive

Agile data code 2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Stars: ✭ 413 (-65.44%)

Mutual labels: airflow, spark

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+1745.02%)

Mutual labels: spark, hadoop

Airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Stars: ✭ 24,101 (+1916.82%)

Mutual labels: airflow, workflow

Pdf

编程电子书，电子书，编程书籍，包括C，C#，Docker，Elasticsearch，Git，Hadoop，HeadFirst，Java，Javascript，jvm，Kafka，Linux，Maven，MongoDB，MyBatis，MySQL，Netty，Nginx，Python，RabbitMQ，Redis，Scala，Solr，Spark，Spring，SpringBoot，SpringCloud，TCPIP，Tomcat，Zookeeper，人工智能，大数据类，并发编程，数据库类，数据挖掘，新面试题，架构设计，算法系列，计算机类，设计模式，软件测试，重构优化，等更多分类

Stars: ✭ 12,009 (+904.94%)

Mutual labels: spark, hadoop

Yauaa

Yet Another UserAgent Analyzer

Stars: ✭ 472 (-60.5%)

Mutual labels: flink, hive

Waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.

Stars: ✭ 60 (-94.98%)

Mutual labels: spark, hadoop

Cloudflow

Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.

Stars: ✭ 278 (-76.74%)

Mutual labels: spark, flink

Marmaray

Generic Data Ingestion & Dispersal Library for Hadoop

Stars: ✭ 414 (-65.36%)

Mutual labels: spark, hadoop

Bigdata

💎🔥大数据学习笔记

Stars: ✭ 488 (-59.16%)

Mutual labels: hadoop, hive

Sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

Stars: ✭ 513 (-57.07%)

Mutual labels: spark, workflow

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (-47.03%)

Mutual labels: spark, etl

Useractionanalyzeplatform

电商用户行为分析大数据平台

Stars: ✭ 645 (-46.03%)

Mutual labels: spark, hadoop

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+373.31%)

Mutual labels: spark, hadoop

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Stars: ✭ 793 (-33.64%)

Mutual labels: airflow, spark

Databook

A facebook for data

Stars: ✭ 26 (-97.82%)

Mutual labels: airflow, hive

Kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Stars: ✭ 916 (-23.35%)

Mutual labels: spark, hadoop

Dockerfiles

50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu

Stars: ✭ 847 (-29.12%)

Mutual labels: spark, hadoop

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+361.34%)

Mutual labels: spark, flink

61-120 of 1414 similar projects

‹

›

next*5