FlinkApache Flink is an open source project of The Apache Software Foundation (ASF).
The Apache Flink project originated from the Stratosphere research project.
Stars: ✭ 17,781 (+26438.81%)
SylphStream computing platform for bigdata
Stars: ✭ 362 (+440.3%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+8128.36%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+1179.1%)
K8s Ingress ClaimAn admission control policy that safeguards against accidental duplicate claiming of Hosts/Domains.
Stars: ✭ 14 (-79.1%)
Blog demosCSDN博客专家程序员欣宸的github,这里有四百多篇原创文章的详细分类和汇总,以及对应的源码,内容涉及Java、Docker、Kubernetes、DevOPS等方面
Stars: ✭ 1,030 (+1437.31%)
AutodlAutomated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (+1174.63%)
AttacaRobust, distributed version control for large files.
Stars: ✭ 41 (-38.81%)
Awesome ScalabilityThe Patterns of Scalable, Reliable, and Performant Large-Scale Systems
Stars: ✭ 36,688 (+54658.21%)
Datumbox FrameworkDatumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
Stars: ✭ 1,063 (+1486.57%)
Dremio OssDremio - the missing link in modern data
Stars: ✭ 862 (+1186.57%)
Hazelcast JetDistributed Stream and Batch Processing
Stars: ✭ 855 (+1176.12%)
MoosefsMooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+1429.85%)
Pyspark Setup DemoDemo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-64.18%)
NabhashAn extremely fast Non-crypto-safe AES Based Hash algorithm for Big Data
Stars: ✭ 62 (-7.46%)
Bigdataguide大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+1119.4%)
Kibble 1Apache Kibble - a tool to collect, aggregate and visualize data about any software project
Stars: ✭ 54 (-19.4%)
Rakam Api📈 Collect customer event data from your apps. (Note that this project only includes the API collector, not the visualization platform)
Stars: ✭ 772 (+1052.24%)
Esper TvEsper instance for TV news analysis
Stars: ✭ 37 (-44.78%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+1011.94%)
TweetmapA real time Tweet Trend Map and Sentiment Analysis web application with kafka, Angular, Spring Boot, Flink, Elasticsearch, Kibana, Docker and Kubernetes deployed on the cloud
Stars: ✭ 28 (-58.21%)
OodtMirror of Apache OODT
Stars: ✭ 52 (-22.39%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+47091.04%)
YmcacheYMCache is a lightweight object caching solution for iOS and Mac OS X that is designed for highly parallel access scenarios.
Stars: ✭ 58 (-13.43%)
PhoenixMirror of Apache Phoenix
Stars: ✭ 867 (+1194.03%)
TrckQuery engine for TrailDB
Stars: ✭ 48 (-28.36%)
SparkjniA heterogeneous Apache Spark framework.
Stars: ✭ 11 (-83.58%)
WarpConvert and analyze large data sets at light speed, on Mac and iOS.
Stars: ✭ 62 (-7.46%)
AccumuloApache Accumulo
Stars: ✭ 857 (+1179.1%)
TraildbTrailDB is an efficient tool for storing and querying series of events
Stars: ✭ 1,029 (+1435.82%)
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+1174.63%)
PretzelJavascript full-stack framework for Big Data visualisation and analysis
Stars: ✭ 26 (-61.19%)
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (-71.64%)
Cloud VolumeRead and write Neuroglancer datasets programmatically.
Stars: ✭ 63 (-5.97%)
Hadoop For GeoeventArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-92.54%)
EgadsA Java package to automatically detect anomalies in large scale time-series data
Stars: ✭ 997 (+1388.06%)
SqoopMirror of Apache Sqoop
Stars: ✭ 817 (+1119.4%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-17.91%)
TitanoboaTitanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Stars: ✭ 787 (+1074.63%)
StormMirror of Apache Storm
Stars: ✭ 6,297 (+9298.51%)
VerticapyVerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
Stars: ✭ 59 (-11.94%)
CythonThe most widely used Python to C compiler
Stars: ✭ 6,588 (+9732.84%)
MetricsMeasure behavior of Java applications
Stars: ✭ 35 (-47.76%)
Lifion KinesisA native Node.js producer and consumer library for Amazon Kinesis Data Streams
Stars: ✭ 54 (-19.4%)
SamzaMirror of Apache Samza
Stars: ✭ 676 (+908.96%)
Data Science CareerCareer Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (+840.3%)
SdcIntel® Scalable Dataframe Compiler for Pandas*
Stars: ✭ 623 (+829.85%)
SkymapHigh-throughput gene to knowledge mapping through massive integration of public sequencing data.
Stars: ✭ 29 (-56.72%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-2.99%)