Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-86.15%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (-89.05%)
ParquetviewerSimple windows desktop application for viewing & querying Apache Parquet files
Stars: ✭ 145 (-88.65%)
DrillApache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+26.68%)
Amazon S3 Find And ForgetAmazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (-91%)
Awkward 0.xManipulate arrays of complex data structures as easily as Numpy.
Stars: ✭ 216 (-83.1%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+28.48%)
CarbondataMirror of Apache CarbonData
Stars: ✭ 1,158 (-9.39%)
Lifion KinesisA native Node.js producer and consumer library for Amazon Kinesis Data Streams
Stars: ✭ 54 (-95.77%)
OodtMirror of Apache OODT
Stars: ✭ 52 (-95.93%)
TrckQuery engine for TrailDB
Stars: ✭ 48 (-96.24%)
Flink ShadedApache Flink shaded artifacts repository
Stars: ✭ 67 (-94.76%)
TraildbTrailDB is an efficient tool for storing and querying series of events
Stars: ✭ 1,029 (-19.48%)
Cloud VolumeRead and write Neuroglancer datasets programmatically.
Stars: ✭ 63 (-95.07%)
AttacaRobust, distributed version control for large files.
Stars: ✭ 41 (-96.79%)
Uproot4ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (-93.74%)
LabsResearch on distributed system
Stars: ✭ 73 (-94.29%)
WarpConvert and analyze large data sets at light speed, on Mac and iOS.
Stars: ✭ 62 (-95.15%)
MetricsMeasure behavior of Java applications
Stars: ✭ 35 (-97.26%)
Kibble 1Apache Kibble - a tool to collect, aggregate and visualize data about any software project
Stars: ✭ 54 (-95.77%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-94.6%)
Macro mlCourse Website on Macroeconomic Analysis with Machine Learning and Big Data
Stars: ✭ 53 (-95.85%)
Datumbox FrameworkDatumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
Stars: ✭ 1,063 (-16.82%)
Node ParquetNodeJS module to access apache parquet format files
Stars: ✭ 46 (-96.4%)
PanoptesA Global Scale Network Telemetry Ecosystem
Stars: ✭ 80 (-93.74%)
MoosefsMooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (-19.8%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-94.91%)
QuiltQuilt is a self-organizing data hub for S3
Stars: ✭ 1,007 (-21.21%)
CookbookThe Data Engineering Cookbook
Stars: ✭ 9,829 (+669.09%)
EgadsA Java package to automatically detect anomalies in large scale time-series data
Stars: ✭ 997 (-21.99%)
Esper TvEsper instance for TV news analysis
Stars: ✭ 37 (-97.1%)
Bigdata File ViewerA cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (-93.27%)
NabhashAn extremely fast Non-crypto-safe AES Based Hash algorithm for Big Data
Stars: ✭ 62 (-95.15%)
PucketBucketing and partitioning system for Parquet
Stars: ✭ 29 (-97.73%)
SkymapHigh-throughput gene to knowledge mapping through massive integration of public sequencing data.
Stars: ✭ 29 (-97.73%)
BookkeeperApache Bookkeeper
Stars: ✭ 1,178 (-7.82%)
PetastormPetastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Stars: ✭ 1,108 (-13.3%)
QcportalA client interface to the QCArchive Project (read-only image of QCFractal)
Stars: ✭ 29 (-97.73%)
Awesome ScalabilityThe Patterns of Scalable, Reliable, and Performant Large-Scale Systems
Stars: ✭ 36,688 (+2770.74%)
VerticapyVerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
Stars: ✭ 59 (-95.38%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+2374.02%)
K8s Ingress ClaimAn admission control policy that safeguards against accidental duplicate claiming of Hosts/Domains.
Stars: ✭ 14 (-98.9%)
IotdbApache IoTDB
Stars: ✭ 1,221 (-4.46%)
Attic LensMirror of Apache Lens
Stars: ✭ 58 (-95.46%)
PhoenixMirror of Apache Phoenix
Stars: ✭ 867 (-32.16%)
Dremio OssDremio - the missing link in modern data
Stars: ✭ 862 (-32.55%)
YmcacheYMCache is a lightweight object caching solution for iOS and Mac OS X that is designed for highly parallel access scenarios.
Stars: ✭ 58 (-95.46%)
SparkjniA heterogeneous Apache Spark framework.
Stars: ✭ 11 (-99.14%)