Awesome PulsarA curated list of Pulsar tools, integrations and resources.
Stars: ✭ 57 (-83.38%)
XsqlUnified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (-48.69%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-83.97%)
Utils4sscala、spark使用过程中,各种测试用例以及相关资料整理
Stars: ✭ 1,070 (+211.95%)
Kraps RpcA RPC framework leveraging Spark RPC module
Stars: ✭ 175 (-48.98%)
HybridBackendEfficient training of deep recommenders on cloud.
Stars: ✭ 30 (-91.25%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+634.11%)
Delta ArchitectureStreaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Stars: ✭ 43 (-87.46%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+507.58%)
GatkOfficial code repository for GATK versions 4 and up
Stars: ✭ 1,002 (+192.13%)
Search Ads Web ServiceOnline search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]
Stars: ✭ 30 (-91.25%)
PixiedustPython Helper library for Jupyter Notebooks
Stars: ✭ 998 (+190.96%)
SnappydataProject SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
Stars: ✭ 995 (+190.09%)
Whylogs JavaProfile and monitor your ML data pipeline end-to-end
Stars: ✭ 164 (-52.19%)
Spark Hbase ConnectorConnect Spark to HBase for reading and writing data with ease
Stars: ✭ 299 (-12.83%)
Data Algorithms Book MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+176.68%)
LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+577.26%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+9118.08%)
openverse-catalogIdentifies and collects data on cc-licensed content across web crawl data and public apis.
Stars: ✭ 27 (-92.13%)
FlintA Time Series Library for Apache Spark
Stars: ✭ 878 (+155.98%)
GlowAn open-source toolkit for large-scale genomic analysis
Stars: ✭ 159 (-53.64%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-95.92%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-92.71%)
Sparkling TitanicTraining models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-96.5%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (-53.94%)
MareMaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.
Stars: ✭ 11 (-96.79%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-72.3%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+149.85%)
Tiledb VcfEfficient variant-call data storage and retrieval library using the TileDB storage library.
Stars: ✭ 26 (-92.42%)
MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+170.85%)
QuillCompile-time Language Integrated Queries for Scala
Stars: ✭ 1,998 (+482.51%)
ODSC India 2018My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-92.42%)
KyloKylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Stars: ✭ 916 (+167.06%)
PowderkegLive-coding the cluster!
Stars: ✭ 152 (-55.69%)
dllibdllib is a distributed deep learning library running on Apache Spark
Stars: ✭ 32 (-90.67%)
AlmondA Scala kernel for Jupyter
Stars: ✭ 1,354 (+294.75%)
RecommendationsystemBook recommender system using collaborative filtering based on Spark
Stars: ✭ 244 (-28.86%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-71.72%)
SparklensQubole Sparklens tool for performance tuning Apache Spark
Stars: ✭ 345 (+0.58%)
IqlAn ad hoc query service based on the spark sql engine.(基于spark sql引擎的即席查询服务)
Stars: ✭ 341 (-0.58%)
PystoreFast data store for Pandas time-series data
Stars: ✭ 325 (-5.25%)
SplineData Lineage Tracking And Visualization Solution
Stars: ✭ 306 (-10.79%)
Sk DistDistributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (-24.2%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-67.64%)