prostoProsto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (-80.85%)
RedashMake Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Stars: ✭ 20,147 (+7044.33%)
spark-druid-olapSparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 286 (+1.42%)
smolderHL7 Apache Spark Datasource
Stars: ✭ 33 (-88.3%)
trembitaModel complex data transformation pipelines easily
Stars: ✭ 44 (-84.4%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-91.13%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (-8.87%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-60.64%)
arakatARAKAT - Big Data Analysis and Business Intelligence Application Development Platform
Stars: ✭ 23 (-91.84%)
pre-commit-dbt🎣 List of `pre-commit` hooks to ensure the quality of your `dbt` projects.
Stars: ✭ 149 (-47.16%)
kafka-compose🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-88.65%)
Ad-Hoc-Report-Builder-.net-mvcOpen Source Reporting tool for .NET6/.NET Core/.NET Framework that you can embed in your application and generate dashboards and ad hoc reports
Stars: ✭ 43 (-84.75%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-95.04%)
Sk DistDistributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (-7.8%)
GuitarA Simple and Efficient Distributed Multidimensional BI Analysis Engine.
Stars: ✭ 86 (-69.5%)
dllibdllib is a distributed deep learning library running on Apache Spark
Stars: ✭ 32 (-88.65%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+771.99%)
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (-3.55%)
spark-data-sourcesDeveloping Spark External Data Sources using the V2 API
Stars: ✭ 36 (-87.23%)
sentry-sparkApache Spark Sentry Integration
Stars: ✭ 14 (-95.04%)
bigkubeMinikube for big data with Scala and Spark
Stars: ✭ 16 (-94.33%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (-67.73%)
spark-word2vecA parallel implementation of word2vec based on Spark
Stars: ✭ 24 (-91.49%)
SparkV🤖⚡ | The most POWERFUL multipurpose chat/meme bot that will boost the activity in your server.
Stars: ✭ 24 (-91.49%)
Book本项目收藏这些年来看过或者听过的一些不错的书籍,在整理文件时看见这些,发现删掉有点可惜,放着又太浪费空间,本着分享的原则,就把它们共享出来,一方面给需要的读者提供这些书籍,另一方面也是一种像知识库的积累吧
Stars: ✭ 47 (-83.33%)
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-91.13%)
spark-http-streamspark structured streaming via HTTP communication
Stars: ✭ 17 (-93.97%)
CasperA compiler for automatically re-targeting sequential Java code to Apache Spark.
Stars: ✭ 45 (-84.04%)
BlazerBusiness intelligence made simple
Stars: ✭ 3,102 (+1000%)
visionsType System for Data Analysis in Python
Stars: ✭ 136 (-51.77%)
daf-kyloKylo integration with PDND (previously DAF).
Stars: ✭ 20 (-92.91%)
spark-demosCollection of different demo applications using Apache Spark
Stars: ✭ 15 (-94.68%)
Spark Jupyter AwsA guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (-8.16%)
tpch-sparkTPC-H queries in Apache Spark SQL using native DataFrames API
Stars: ✭ 63 (-77.66%)
frovedisFramework of vectorized and distributed data analytics
Stars: ✭ 59 (-79.08%)
Hbase RddSpark RDD to read, write and delete from HBase
Stars: ✭ 277 (-1.77%)
Spark-PMoFSpark Shuffle Optimization with RDMA+AEP
Stars: ✭ 28 (-90.07%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-95.39%)
Big Data Rosetta CodeCode snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Stars: ✭ 254 (-9.93%)
docker-sparkApache Spark docker container image (Standalone mode)
Stars: ✭ 34 (-87.94%)
Covid19TrackerA Robinhood style COVID-19 🦠 Android tracking app for the US. Open source and built with Kotlin.
Stars: ✭ 65 (-76.95%)
shamashAutoscaling for Google Cloud Dataproc
Stars: ✭ 31 (-89.01%)
HelkThe Hunting ELK
Stars: ✭ 3,097 (+998.23%)
confluent-spark-avroSpark UDFs to deserialize Avro messages with schemas stored in Schema Registry.
Stars: ✭ 18 (-93.62%)
Search Ads Web ServiceOnline search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]
Stars: ✭ 30 (-89.36%)
telleryTellery lets you build metrics using SQL and bring them to your team. As easy as using a document. As powerful as a data modeling tool.
Stars: ✭ 219 (-22.34%)
CloudflowCloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: ✭ 278 (-1.42%)
Knowage ServerKnowage is the professional open source suite for modern business analytics over traditional sources and big data systems.
Stars: ✭ 276 (-2.13%)
Docker Spark ClusterA simple spark standalone cluster for your testing environment purposses
Stars: ✭ 261 (-7.45%)