Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-68.39%)
dllibdllib is a distributed deep learning library running on Apache Spark
Stars: ✭ 32 (-81.61%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-20.11%)
Utils4sscala、spark使用过程中,各种测试用例以及相关资料整理
Stars: ✭ 1,070 (+514.94%)
prostoProsto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (-68.97%)
Lambda ArchApplying Lambda Architecture with Spark, Kafka, and Cassandra.
Stars: ✭ 111 (-36.21%)
confluent-spark-avroSpark UDFs to deserialize Avro messages with schemas stored in Schema Registry.
Stars: ✭ 18 (-89.66%)
blogblog entries
Stars: ✭ 39 (-77.59%)
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-85.63%)
CasperA compiler for automatically re-targeting sequential Java code to Apache Spark.
Stars: ✭ 45 (-74.14%)
visionsType System for Data Analysis in Python
Stars: ✭ 136 (-21.84%)
QuicksqlA Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (+946.55%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-36.21%)
Delta ArchitectureStreaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Stars: ✭ 43 (-75.29%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+1313.22%)
Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (-37.36%)
Spark-PMoFSpark Shuffle Optimization with RDMA+AEP
Stars: ✭ 28 (-83.91%)
GatkOfficial code repository for GATK versions 4 and up
Stars: ✭ 1,002 (+475.86%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-92.53%)
Spark.jlJulia binding for Apache Spark
Stars: ✭ 153 (-12.07%)
docker-sparkApache Spark docker container image (Standalone mode)
Stars: ✭ 34 (-80.46%)
PixiedustPython Helper library for Jupyter Notebooks
Stars: ✭ 998 (+473.56%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-37.93%)
SnappydataProject SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
Stars: ✭ 995 (+471.84%)
Apache Spark NodeNode.js bindings for Apache Spark DataFrame APIs
Stars: ✭ 136 (-21.84%)
Search Ads Web ServiceOnline search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]
Stars: ✭ 30 (-82.76%)
Flink Learningflink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+6439.08%)
openverse-catalogIdentifies and collects data on cc-licensed content across web crawl data and public apis.
Stars: ✭ 27 (-84.48%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-45.4%)
ODSC India 2018My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-85.06%)
sparkar-voltsAn extensive non-reactive Typescript framework that eases the development experience in Spark AR
Stars: ✭ 15 (-91.38%)
LogigskA Linux based software package to control led's on Logitech G910, G810, G610 and G410.
Stars: ✭ 107 (-38.51%)
experimentsCode examples for my blog posts
Stars: ✭ 21 (-87.93%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+448.28%)
splinkImplementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters
Stars: ✭ 181 (+4.02%)
Data Algorithms Book MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+445.4%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+1347.13%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+1097.7%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (-9.2%)
Technology Talk汇总java生态圈常用技术框架、开源中间件,系统架构、数据库、大公司架构案例、常用三方类库、项目管理、线上问题排查、个人成长、思考等知识
Stars: ✭ 12,136 (+6874.71%)
ZparkioBoiler plate framework to use Spark and ZIO together.
Stars: ✭ 121 (-30.46%)
Lpa DetectorOptimize and improve the Label propagation algorithm
Stars: ✭ 75 (-56.9%)
Spark SolrTools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.
Stars: ✭ 411 (+136.21%)