StroomStroom is a highly scalable data storage, processing and analysis platform.
Stars: ✭ 344 (+166.67%)
ReefMirror of Apache REEF
Stars: ✭ 92 (-28.68%)
OzoneScalable, redundant, and distributed object store for Apache Hadoop
Stars: ✭ 330 (+155.81%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+2259.69%)
AttacaRobust, distributed version control for large files.
Stars: ✭ 41 (-68.22%)
CboardAn easy to use, self-service open BI reporting and BI dashboard platform.
Stars: ✭ 2,795 (+2066.67%)
HyperspaceAn open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (+90.7%)
AzuredatalakeSamples and Docs for Azure Data Lake Store and Analytics
Stars: ✭ 128 (-0.78%)
TrafodionApache Trafodion
Stars: ✭ 242 (+87.6%)
Uproot3ROOT I/O in pure Python and NumPy.
Stars: ✭ 312 (+141.86%)
Selinon An advanced distributed task flow management on top of Celery
Stars: ✭ 237 (+83.72%)
Books整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据 、推荐系统、数据库、数据挖掘 、机器学习 、深度学习 、算法等。
Stars: ✭ 222 (+72.09%)
MistServerless proxy for Spark cluster
Stars: ✭ 309 (+139.53%)
NakedtensorBare bone examples of machine learning in TensorFlow
Stars: ✭ 2,443 (+1793.8%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-29.46%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+67.44%)
HelixMirror of Apache Helix
Stars: ✭ 304 (+135.66%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+66.67%)
MetricsMeasure behavior of Java applications
Stars: ✭ 35 (-72.87%)
CalciteApache Calcite
Stars: ✭ 2,816 (+2082.95%)
CloudbreakA tool for provisioning and managing Apache Hadoop clusters in the cloud. Cloudbreak, as part of the Hortonworks Data Platform, makes it easy to provision, configure and elastically grow HDP clusters on cloud infrastructure. Cloudbreak can be used to provision Hadoop across cloud infrastructure providers including AWS, Azure, GCP and OpenStack.
Stars: ✭ 301 (+133.33%)
Couchdb DockerSemi-official Apache CouchDB Docker images
Stars: ✭ 194 (+50.39%)
AmbariMirror of Apache Ambari
Stars: ✭ 1,576 (+1121.71%)
Baize白泽自动化运维系统:配置管理、网络探测、资产管理、业务管理、CMDB、CD、DevOps、作业编排、任务编排等功能,未来将添加监控、报警、日志分析、大数据分析等部分内容
Stars: ✭ 296 (+129.46%)
Presto Go ClientA Presto client for the Go programming language.
Stars: ✭ 183 (+41.86%)
SkymapHigh-throughput gene to knowledge mapping through massive integration of public sequencing data.
Stars: ✭ 29 (-77.52%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+37.21%)
CrateCrateDB is a distributed SQL database that makes it simple to store and analyze
massive amounts of data in real-time.
Stars: ✭ 3,254 (+2422.48%)
KeyviKeyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.
Stars: ✭ 171 (+32.56%)
GeopysparkGeoTrellis for PySpark
Stars: ✭ 167 (+29.46%)
Oie ResourcesA curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (+119.38%)
FluoApache Fluo
Stars: ✭ 159 (+23.26%)
Awesome ScalabilityThe Patterns of Scalable, Reliable, and Performant Large-Scale Systems
Stars: ✭ 36,688 (+28340.31%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (+17.83%)
Knowage ServerKnowage is the professional open source suite for modern business analytics over traditional sources and big data systems.
Stars: ✭ 276 (+113.95%)
DatasciencevmTools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (+18.6%)
Report自动化配置报表平台。演示地址http://58.87.112.247/report 账号 visitor密码123456
Stars: ✭ 123 (-4.65%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+16.28%)
100daysofmlcodeMy journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.
Stars: ✭ 146 (+13.18%)
K8s Ingress ClaimAn admission control policy that safeguards against accidental duplicate claiming of Hosts/Domains.
Stars: ✭ 14 (-89.15%)
MetamodelMirror of Apache Metamodel
Stars: ✭ 143 (+10.85%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (+99.22%)
PanoptesA Global Scale Network Telemetry Ecosystem
Stars: ✭ 80 (-37.98%)
bandar-logMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 20 (-84.5%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+1172.87%)
TajoMirror of Apache Tajo
Stars: ✭ 128 (-0.78%)
RichdemHigh-performance Terrain and Hydrology Analysis
Stars: ✭ 127 (-1.55%)
CmakCMAK is a tool for managing Apache Kafka clusters
Stars: ✭ 10,544 (+8073.64%)
WarpConvert and analyze large data sets at light speed, on Mac and iOS.
Stars: ✭ 62 (-51.94%)
Fit SneFast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
Stars: ✭ 485 (+275.97%)
nebulaA distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+6253.49%)