shamashAutoscaling for Google Cloud Dataproc
Stars: ✭ 31 (+106.67%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+20193.33%)
XsqlUnified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (+1073.33%)
kafka-compose🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (+113.33%)
Kraps RpcA RPC framework leveraging Spark RPC module
Stars: ✭ 175 (+1066.67%)
Every Single Day I TldrA daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (+1560%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+16686.67%)
yuzhouwanCode Library for My Blog
Stars: ✭ 39 (+160%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+13793.33%)
Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+1546.67%)
frovedisFramework of vectorized and distributed data analytics
Stars: ✭ 59 (+293.33%)
Neo4j Spark ConnectorNeo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Stars: ✭ 245 (+1533.33%)
Whylogs JavaProfile and monitor your ML data pipeline end-to-end
Stars: ✭ 164 (+993.33%)
spark-utillow-level helpers for Apache Spark libraries and tests
Stars: ✭ 16 (+6.67%)
LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+15386.67%)
RecommendationsystemBook recommender system using collaborative filtering based on Spark
Stars: ✭ 244 (+1526.67%)
GlowAn open-source toolkit for large-scale genomic analysis
Stars: ✭ 159 (+960%)
sentry-sparkApache Spark Sentry Integration
Stars: ✭ 14 (-6.67%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (+953.33%)
Hadoop Docker基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
Stars: ✭ 238 (+1486.67%)
QuillCompile-time Language Integrated Queries for Scala
Stars: ✭ 1,998 (+13220%)
PowderkegLive-coding the cluster!
Stars: ✭ 152 (+913.33%)
tpch-sparkTPC-H queries in Apache Spark SQL using native DataFrames API
Stars: ✭ 63 (+320%)
MydatascienceportfolioApplying Data Science and Machine Learning to Solve Real World Business Problems
Stars: ✭ 227 (+1413.33%)
AztkAZTK powered by Azure Batch: On-demand, Dockerized, Spark Jobs on Azure
Stars: ✭ 152 (+913.33%)
spark-druid-olapSparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 286 (+1806.67%)
Cc PysparkProcess Common Crawl data with Python and Spark
Stars: ✭ 147 (+880%)
Spark WorkshopApache Spark™ and Scala Workshops
Stars: ✭ 224 (+1393.33%)
DatacompyPandas and Spark DataFrame comparison for humans
Stars: ✭ 147 (+880%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (+506.67%)
Technology Talk汇总java生态圈常用技术框架、开源中间件,系统架构、数据库、大公司架构案例、常用三方类库、项目管理、线上问题排查、个人成长、思考等知识
Stars: ✭ 12,136 (+80806.67%)
Spark AuthorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Stars: ✭ 141 (+840%)
swordfishOpen-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (+133.33%)
Data science blogsA repository to keep track of all the code that I end up writing for my blog posts.
Stars: ✭ 139 (+826.67%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+1340%)
Isolation ForestA Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Stars: ✭ 139 (+826.67%)
Spark-ArResources for Spark AR
Stars: ✭ 43 (+186.67%)
HorovodDistributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Stars: ✭ 11,943 (+79520%)
Spark Knnk-Nearest Neighbors algorithm on Spark
Stars: ✭ 205 (+1266.67%)
spark-word2vecA parallel implementation of word2vec based on Spark
Stars: ✭ 24 (+60%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+19226.67%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+640%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+16293.33%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-13.33%)
splinkImplementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters
Stars: ✭ 181 (+1106.67%)
ScannsA scalable nearest neighbor search library in Apache Spark
Stars: ✭ 190 (+1166.67%)