autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+101.82%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+5434.55%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (+176.36%)
MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+1589.09%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+172.73%)
DatafusionDataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (+1010.91%)
BallistaDistributed compute platform implemented in Rust, and powered by Apache Arrow.
Stars: ✭ 2,274 (+4034.55%)
Spark DariaEssential Spark extensions and helper methods ✨😲
Stars: ✭ 553 (+905.45%)
Spark RedisA connector for Spark that allows reading and writing to/from Redis cluster
Stars: ✭ 773 (+1305.45%)
UrhoxUrho3D extension library
Stars: ✭ 13 (-76.36%)
SparkjniA heterogeneous Apache Spark framework.
Stars: ✭ 11 (-80%)
TedsdsApache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark
Stars: ✭ 14 (-74.55%)
SnappydataProject SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
Stars: ✭ 995 (+1709.09%)
MlfeatureFeature engineering toolkit for Spark MLlib.
Stars: ✭ 12 (-78.18%)
Dockerfiles50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (+1440%)
Spark SwaggerSpark (http://sparkjava.com/) support for Swagger (https://swagger.io/)
Stars: ✭ 25 (-54.55%)
ChroniclerScala toolchain for InfluxDB
Stars: ✭ 24 (-56.36%)
Delta ArchitectureStreaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Stars: ✭ 43 (-21.82%)
Vagrant ProjectsVagrant projects for various use-cases with Spark, Zeppelin, IPython / Jupyter, SparkR
Stars: ✭ 34 (-38.18%)
DigitrecognizerJava Convolutional Neural Network example for Hand Writing Digit Recognition
Stars: ✭ 23 (-58.18%)
KyloKylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Stars: ✭ 916 (+1565.45%)
Spark FlamegraphEasy CPU Profiling for Apache Spark applications
Stars: ✭ 30 (-45.45%)
FoxcrossAsyncIO serving for data science models
Stars: ✭ 18 (-67.27%)
FlintA Time Series Library for Apache Spark
Stars: ✭ 878 (+1496.36%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-74.55%)
Awesome Recommendation EngineThe purpose of this tiny project is to put things together with the know how that i learned from the course big data expert from formacionhadoop.com The idea is to show how to play with apache spark streaming, kafka,mongo, spark machine learning algorithms.
Stars: ✭ 47 (-14.55%)
Sparkling TitanicTraining models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-78.18%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+1692.73%)
MareMaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.
Stars: ✭ 11 (-80%)
Spark Submit UiThis is a based on playframwork for submit spark app
Stars: ✭ 53 (-3.64%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+1458.18%)
Tiledb VcfEfficient variant-call data storage and retrieval library using the TileDB storage library.
Stars: ✭ 26 (-52.73%)
Spark TdaSparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
Stars: ✭ 45 (-18.18%)
Pandas TaTechnical Analysis Indicators - Pandas TA is an easy to use Python 3 Pandas Extension with 130+ Indicators
Stars: ✭ 962 (+1649.09%)
Docker HadoopA Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Stars: ✭ 54 (-1.82%)
BoltzmanncleanFill missing values in Pandas DataFrames using Restricted Boltzmann Machines
Stars: ✭ 23 (-58.18%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+1634.55%)
Spark NkpNatural Korean Processor for Apache Spark
Stars: ✭ 50 (-9.09%)
GatkOfficial code repository for GATK versions 4 and up
Stars: ✭ 1,002 (+1721.82%)
PucketBucketing and partitioning system for Parquet
Stars: ✭ 29 (-47.27%)
Sparkling WaterSparkling Water provides H2O functionality inside Spark cluster
Stars: ✭ 887 (+1512.73%)
DataframeC++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved
Stars: ✭ 828 (+1405.45%)
Data Algorithms Book MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+1625.45%)
Bigdataguide大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+1385.45%)
HeraclesHigh performance HBase / Spark SQL engine
Stars: ✭ 27 (-50.91%)