IotdbApache IoTDB
Stars: ✭ 1,221 (-77.85%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-97.28%)
dockerfilesMulti docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Stars: ✭ 29 (-99.47%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-97.99%)
CrateCrateDB is a distributed SQL database that makes it simple to store and analyze
massive amounts of data in real-time.
Stars: ✭ 3,254 (-40.98%)
Nitrite JavaJava embedded nosql document store
Stars: ✭ 538 (-90.24%)
DbreezeC# .NET MONO NOSQL ( key value store embedded ) ACID multi-paradigm database management system.
Stars: ✭ 383 (-93.05%)
merkle-dbHigh-scalability analytics database built on immutable merkle-trees
Stars: ✭ 44 (-99.2%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-99.76%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-99.64%)
CloudflowCloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: ✭ 278 (-94.96%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (-16.91%)
CorfudbA cluster consistency platform
Stars: ✭ 539 (-90.22%)
Bedquilt CoreA JSON document store on PostgreSQL
Stars: ✭ 256 (-95.36%)
HiveApache Hive
Stars: ✭ 4,031 (-26.88%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (-30.84%)
IgniteApache Ignite
Stars: ✭ 4,027 (-26.95%)
Awesome ElasticsearchA curated list of the most important and useful resources about elasticsearch: articles, videos, blogs, tips and tricks, use cases. All about Elasticsearch!
Stars: ✭ 4,168 (-24.4%)
FeatranA Scala feature transformation library for data science and machine learning
Stars: ✭ 420 (-92.38%)
TinydbTinyDB is a lightweight document oriented database optimized for your happiness :)
Stars: ✭ 4,713 (-14.51%)
OrientdbOrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries. OrientDB Community Edition is Open Source using a liberal Apache 2 license.
Stars: ✭ 4,394 (-20.3%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (-44.79%)
Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (-95.52%)
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (-98.69%)
HyperspaceAn open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (-95.54%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (-98.35%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-98.28%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-99.75%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-96.08%)
CouchdbSeamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
Stars: ✭ 5,166 (-6.29%)
Inquiry Deprecated[DEPRECATED]: Prefer Room by Google, or SQLDelight by Square.
Stars: ✭ 264 (-95.21%)
FlinkApache Flink is an open source project of The Apache Software Foundation (ASF).
The Apache Flink project originated from the Stratosphere research project.
Stars: ✭ 17,781 (+222.53%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (-95.34%)
ConcourseDistributed database warehouse for transactions, search and analytics across time.
Stars: ✭ 310 (-94.38%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (-29.2%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (-96.1%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-93.45%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (-93.43%)
MongoThe MongoDB Database
Stars: ✭ 20,883 (+278.8%)
SylphStream computing platform for bigdata
Stars: ✭ 362 (-93.43%)
Objectbox JavaObjectBox is a superfast lightweight database for objects
Stars: ✭ 3,950 (-28.35%)
OrcApache ORC - the smallest, fastest columnar storage for Hadoop workloads
Stars: ✭ 389 (-92.94%)
KacheA simple in memory cache written using go
Stars: ✭ 349 (-93.67%)
RedisliteRedis in a python module.
Stars: ✭ 464 (-91.58%)
Csharp DriverDataStax C# Driver for Apache Cassandra
Stars: ✭ 477 (-91.35%)
Bdp Dataplatform大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Stars: ✭ 456 (-91.73%)
SleekdbPure PHP NoSQL database with no dependency. Flat file, JSON based document database.
Stars: ✭ 450 (-91.84%)
Nodejs FirestoreNode.js client for Google Cloud Firestore: a NoSQL document database built for automatic scaling, high performance, and ease of application development.
Stars: ✭ 475 (-91.38%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+299.93%)
ArangojsThe official ArangoDB JavaScript driver.
Stars: ✭ 503 (-90.88%)
Sparkstreaming💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (-96.75%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (-47.42%)
God Of Bigdata专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+8.98%)