blogblog entries
Stars: ✭ 39 (+39.29%)
sparkar-voltsAn extensive non-reactive Typescript framework that eases the development experience in Spark AR
Stars: ✭ 15 (-46.43%)
twigsAlternate firmware for Mutable Instruments Branches synthesizer module
Stars: ✭ 21 (-25%)
Every Single Day I TldrA daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (+789.29%)
RemoteShuffleServiceCeleborn provides an elastic and high-performance service for shuffle and spilled data.
Stars: ✭ 262 (+835.71%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (+239.29%)
Validate-DCBValidator for RDMA Configuration and Best Practices
Stars: ✭ 34 (+21.43%)
shamashAutoscaling for Google Cloud Dataproc
Stars: ✭ 31 (+10.71%)
splinkImplementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters
Stars: ✭ 181 (+546.43%)
Neo4j Spark ConnectorNeo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Stars: ✭ 245 (+775%)
Hadoop Docker基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
Stars: ✭ 238 (+750%)
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (+14.29%)
spark-word2vecA parallel implementation of word2vec based on Spark
Stars: ✭ 24 (-14.29%)
ksmbdksmbd kernel server(SMB/CIFS server)
Stars: ✭ 181 (+546.43%)
ODSC India 2018My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-7.14%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+10771.43%)
sentry-sparkApache Spark Sentry Integration
Stars: ✭ 14 (-50%)
Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+782.14%)
experimentsCode examples for my blog posts
Stars: ✭ 21 (-25%)
RecommendationsystemBook recommender system using collaborative filtering based on Spark
Stars: ✭ 244 (+771.43%)
yuzhouwanCode Library for My Blog
Stars: ✭ 39 (+39.29%)
visualize-data-with-pythonA Jupyter notebook using some standard techniques for data science and data engineering to analyze data for the 2017 flooding in Houston, TX.
Stars: ✭ 60 (+114.29%)
MydatascienceportfolioApplying Data Science and Machine Learning to Solve Real World Business Problems
Stars: ✭ 227 (+710.71%)
openverse-catalogIdentifies and collects data on cc-licensed content across web crawl data and public apis.
Stars: ✭ 27 (-3.57%)
darpcDaRPC: Data Center Remote Procedure Call
Stars: ✭ 49 (+75%)
ksmbdksmbd kernel server(SMB/CIFS server)
Stars: ✭ 98 (+250%)
pDPMPassive Disaggregated Persistent Memory at USENIX ATC 2020.
Stars: ✭ 38 (+35.71%)
docker-sparkApache Spark docker container image (Standalone mode)
Stars: ✭ 34 (+21.43%)
ashuffleAutomatic library-wide shuffle for mpd.
Stars: ✭ 64 (+128.57%)
spark-druid-olapSparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 286 (+921.43%)
CoyoteFramework providing operating system abstractions and a range of shared networking (RDMA, TCP/IP) and memory services to common modern heterogeneous platforms.
Stars: ✭ 80 (+185.71%)
Turbo-TransposeTranspose: SIMD Integer+Floating Point Compression Filter
Stars: ✭ 50 (+78.57%)
swordfishOpen-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (+25%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-53.57%)
Spark Fast TestsApache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
Stars: ✭ 249 (+789.29%)
Spark-ArResources for Spark AR
Stars: ✭ 43 (+53.57%)
HyperspaceAn open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (+778.57%)
Search Ads Web ServiceOnline search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]
Stars: ✭ 30 (+7.14%)
DparkPython clone of Spark, a MapReduce alike framework in Python
Stars: ✭ 2,668 (+9428.57%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-28.57%)
Azure Event Hubs☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs
Stars: ✭ 233 (+732.14%)
spark-stringmetricSpark functions to run popular phonetic and string matching algorithms
Stars: ✭ 51 (+82.14%)
rlibRLib is a header-only library for easier usage of RDMA.
Stars: ✭ 17 (-39.29%)
kafka-compose🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (+14.29%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (+225%)
spark-utillow-level helpers for Apache Spark libraries and tests
Stars: ✭ 16 (-42.86%)