All Projects → learning-spark → Similar Projects or Alternatives

368 Open source projects that are alternatives of or similar to learning-spark

Hadoop cookbook
Cookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (+192.86%)
Mutual labels:  hadoop
hadoop-crypto
Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.
Stars: ✭ 38 (+35.71%)
Mutual labels:  hadoop
datasqueeze
Hadoop utility to compact small files
Stars: ✭ 18 (-35.71%)
Mutual labels:  hadoop
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+3421.43%)
Mutual labels:  bigdata
Hadoop Pot
A scalable Apache Hadoop-based implementation of the Pooled Time Series video similarity algorithm based on M. Ryoo et al paper CVPR 2015.
Stars: ✭ 8 (-71.43%)
Mutual labels:  hadoop
Aws Auto Terminate Idle Emr
AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Stars: ✭ 21 (-25%)
Mutual labels:  bigdata
ambari-hdp-docker
Dockerfiles and Docker Compose for HDP 2.6 with Blueprints
Stars: ✭ 23 (-17.86%)
Mutual labels:  hadoop
Docker Hadoop Cluster
Multiple node cluster on Docker for self development.
Stars: ✭ 82 (+192.86%)
Mutual labels:  hadoop
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-32.14%)
Mutual labels:  hadoop
Spark Streaming Monitoring With Lightning
Plot live-stats as graph from ApacheSpark application using Lightning-viz
Stars: ✭ 15 (-46.43%)
Mutual labels:  bigdata
Kylo
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Stars: ✭ 916 (+3171.43%)
Mutual labels:  hadoop
10 Weeks
10-weeks of technology exploration
Stars: ✭ 22 (-21.43%)
Mutual labels:  bigdata
skein
A tool and library for easily deploying applications on Apache YARN
Stars: ✭ 128 (+357.14%)
Mutual labels:  hadoop
Coding Now
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
Stars: ✭ 750 (+2578.57%)
Mutual labels:  bigdata
Szt Bigdata
深圳地铁大数据客流分析系统🚇🚄🌟
Stars: ✭ 826 (+2850%)
Mutual labels:  hadoop
Running Elasticsearch Fun Profit
A book about running Elasticsearch
Stars: ✭ 664 (+2271.43%)
Mutual labels:  bigdata
presto
Teradata Distribution of Presto -- A Distributed SQL Query Engine for Big Data
Stars: ✭ 91 (+225%)
Mutual labels:  hadoop
Cds
Data syncing in golang for ClickHouse.
Stars: ✭ 501 (+1689.29%)
Mutual labels:  bigdata
TiBigData
TiDB connectors for Flink/Hive/Presto
Stars: ✭ 192 (+585.71%)
Mutual labels:  bigdata
Tensorbase
TensorBase BE is building a high performance, cloud neutral bigdata warehouse for SMEs fully in Rust.
Stars: ✭ 440 (+1471.43%)
Mutual labels:  bigdata
Useractionanalyzeplatform
电商用户行为分析大数据平台
Stars: ✭ 645 (+2203.57%)
Mutual labels:  hadoop
Circosjs
d3 library to build circular graphs
Stars: ✭ 436 (+1457.14%)
Mutual labels:  bigdata
Hive Jdbc Uber Jar
Hive JDBC "uber" or "standalone" jar based on the latest Apache Hive version
Stars: ✭ 188 (+571.43%)
Mutual labels:  hadoop
Sidekick
High Performance HTTP Sidecar Load Balancer
Stars: ✭ 366 (+1207.14%)
Mutual labels:  bigdata
Javapdf
🍣100本 Java电子书 技术书籍PDF(以下载阅读为荣,以点赞收藏为耻)
Stars: ✭ 609 (+2075%)
Mutual labels:  hadoop
Datawave
DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
Stars: ✭ 347 (+1139.29%)
Mutual labels:  bigdata
Camus
Mirror of Linkedin's Camus
Stars: ✭ 81 (+189.29%)
Mutual labels:  hadoop
hadoop-ecosystem
Visualizations of the Hadoop Ecosystem
Stars: ✭ 20 (-28.57%)
Mutual labels:  hadoop
Datafaker
Datafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具
Stars: ✭ 327 (+1067.86%)
Mutual labels:  bigdata
Dist Keras
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Stars: ✭ 613 (+2089.29%)
Mutual labels:  hadoop
Janusgraph.cn
分布式图数据库 JanusGraph 中文社区,关于 JanusGraph 的一切
Stars: ✭ 273 (+875%)
Mutual labels:  bigdata
Deeplearning4j
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+43746.43%)
Mutual labels:  hadoop
Ldetool
Code generator for fast log file parsers
Stars: ✭ 273 (+875%)
Mutual labels:  bigdata
Hadoop study
定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
Stars: ✭ 567 (+1925%)
Mutual labels:  hadoop
Big Data Rosetta Code
Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Stars: ✭ 254 (+807.14%)
Mutual labels:  bigdata
webhdfs
Node.js WebHDFS REST API client
Stars: ✭ 88 (+214.29%)
Mutual labels:  hadoop
jigsaw-seed
这是组件库 Jigsaw-七巧板(https://github.com/rdkmaster/jigsaw) 的种子工程,建议所有新增的app都以这个工程作为种子开始构建。
Stars: ✭ 17 (-39.29%)
Mutual labels:  bigdata
Gis Tools For Hadoop
The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.
Stars: ✭ 485 (+1632.14%)
Mutual labels:  hadoop
proteic
Streaming and static data visualization for the modern web.
Stars: ✭ 37 (+32.14%)
Mutual labels:  bigdata
Bigdata docker
Big Data Ecosystem Docker
Stars: ✭ 161 (+475%)
Mutual labels:  hadoop
data processing course
Some class materials for a data processing course using PySpark
Stars: ✭ 50 (+78.57%)
Mutual labels:  bigdata
Pdf
编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
Stars: ✭ 12,009 (+42789.29%)
Mutual labels:  hadoop
learning notes
学习笔记
Stars: ✭ 18 (-35.71%)
Mutual labels:  bigdata
BigInsights-on-Apache-Hadoop
Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix
Stars: ✭ 21 (-25%)
Mutual labels:  hadoop
pulsar-user-group-loc-cn
Workspace for China local user group.
Stars: ✭ 19 (-32.14%)
Mutual labels:  bigdata
room-renting
用Python爬取安居客房源信息,并用高德地图进行可视化
Stars: ✭ 16 (-42.86%)
Mutual labels:  bigdata
Hadoop Common
Mirror of Apache Hadoop common
Stars: ✭ 155 (+453.57%)
Mutual labels:  hadoop
taller SparkR
Taller SparkR para las Jornadas de Usuarios de R
Stars: ✭ 12 (-57.14%)
Mutual labels:  bigdata
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+1350%)
Mutual labels:  hadoop
SparkProgrammingInScala
Apache Spark Course Material
Stars: ✭ 57 (+103.57%)
Mutual labels:  bigdata
chatnoir-resiliparse
A robust web archive analytics toolkit
Stars: ✭ 26 (-7.14%)
Mutual labels:  bigdata
liquibase-impala
Liquibase extension to add Impala Database support
Stars: ✭ 23 (-17.86%)
Mutual labels:  hadoop
bigdata-tech-index
Big Data Technology Index
Stars: ✭ 24 (-14.29%)
Mutual labels:  bigdata
jmx exporter-cloudera-hadoop
Prometheus jmx_exporter configurations for Cloudera Hadoop
Stars: ✭ 33 (+17.86%)
Mutual labels:  hadoop
disq
A library for manipulating bioinformatics sequencing formats in Apache Spark
Stars: ✭ 29 (+3.57%)
Mutual labels:  hadoop
LogAnalyzeHelper
论坛日志分析系统清洗程序(包含IP规则库,UDF开发,MapReduce程序,日志数据)
Stars: ✭ 33 (+17.86%)
Mutual labels:  hadoop
greycat
GreyCat - Data Analytics, Temporal data, What-if, Live machine learning
Stars: ✭ 104 (+271.43%)
Mutual labels:  bigdata
optimus
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+4725%)
Mutual labels:  bigdata
Learn machine learning
Road to Machine Learning
Stars: ✭ 81 (+189.29%)
Mutual labels:  hadoop
hadoop-etl-udfs
The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-39.29%)
Mutual labels:  hadoop
301-360 of 368 similar projects