Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+5610.23%)
AlluxioAlluxio, data orchestration for analytics and machine learning in the cloud
Stars: ✭ 5,379 (+2401.86%)
SparkleHaskell on Apache Spark.
Stars: ✭ 419 (+94.88%)
Kafka Storm StarterCode examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (+238.6%)
Spark NkpNatural Korean Processor for Apache Spark
Stars: ✭ 50 (-76.74%)
Docker HadoopA Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Stars: ✭ 54 (-74.88%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-74.42%)
MarmarayGeneric Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (+92.56%)
MoosefsMooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+376.74%)
Awesome PulsarA curated list of Pulsar tools, integrations and resources.
Stars: ✭ 57 (-73.49%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-69.77%)
GeopysparkGeoTrellis for PySpark
Stars: ✭ 167 (-22.33%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+92.09%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-93.49%)
SparkjniA heterogeneous Apache Spark framework.
Stars: ✭ 11 (-94.88%)
ParquetviewerSimple windows desktop application for viewing & querying Apache Parquet files
Stars: ✭ 145 (-32.56%)
Spark TdaSparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
Stars: ✭ 45 (-79.07%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-72.09%)
Cleanframestype-class based data cleansing library for Apache Spark SQL
Stars: ✭ 75 (-65.12%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+455.81%)
HydrographA visual ETL development and debugging tool for big data
Stars: ✭ 144 (-33.02%)
Docker Spark🚢 Docker image for Apache Spark
Stars: ✭ 78 (-63.72%)
Hadoop cookbookCookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (-61.86%)
Uproot4ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (-62.79%)
Spark StatesCustom state store providers for Apache Spark
Stars: ✭ 83 (-61.4%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-57.21%)
LabsResearch on distributed system
Stars: ✭ 73 (-66.05%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-63.26%)
CuesheetA framework for writing Spark 2.x applications in a pretty way
Stars: ✭ 86 (-60%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-24.19%)
Whylogs JavaProfile and monitor your ML data pipeline end-to-end
Stars: ✭ 164 (-23.72%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-54.88%)
Spark On K8s OperatorKubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Stars: ✭ 1,780 (+727.91%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (-34.88%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-49.3%)
SparktutorialSource code for James Lee's Aparch Spark with Java course
Stars: ✭ 105 (-51.16%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-50.23%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-35.35%)
GenieDistributed Big Data Orchestration Service
Stars: ✭ 1,544 (+618.14%)
Xlearning Xdmlextremely distributed machine learning
Stars: ✭ 113 (-47.44%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+658.14%)
Lambda ArchApplying Lambda Architecture with Spark, Kafka, and Cassandra.
Stars: ✭ 111 (-48.37%)
AsakusafwAsakusa Framework
Stars: ✭ 114 (-46.98%)
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (-14.88%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-45.58%)