ParquetviewerSimple windows desktop application for viewing & querying Apache Parquet files
Stars: ✭ 145 (-81.87%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+105.25%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (-82.5%)
Amazon S3 Find And ForgetAmazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (-85.62%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-77.88%)
DrillApache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+102.38%)
Awkward 0.xManipulate arrays of complex data structures as easily as Numpy.
Stars: ✭ 216 (-73%)
NipypeWorkflows and interfaces for neuroimaging packages
Stars: ✭ 557 (-30.37%)
CortxCORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (-46.75%)
Datascience Ai Machinelearning ResourcesAlex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (-48.25%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (-49.25%)
SdcIntel® Scalable Dataframe Compiler for Pandas*
Stars: ✭ 623 (-22.12%)
ThrillThrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
Stars: ✭ 528 (-34%)
MockneatMockNeat is a Java 8+ library that facilitates the generation of arbitrary data for your applications.
Stars: ✭ 410 (-48.75%)
BeamApache Beam is a unified programming model for Batch and Streaming
Stars: ✭ 5,149 (+543.63%)
OrcApache ORC - the smallest, fastest columnar storage for Hadoop workloads
Stars: ✭ 389 (-51.37%)
IgniteApache Ignite
Stars: ✭ 4,027 (+403.38%)
CythonThe most widely used Python to C compiler
Stars: ✭ 6,588 (+723.5%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+607%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (-36.62%)
ChoetlETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (-53.5%)
Circosjsd3 library to build circular graphs
Stars: ✭ 436 (-45.5%)
PachydermReproducible Data Science at Scale!
Stars: ✭ 5,305 (+563.13%)
Data Science CareerCareer Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (-21.25%)
Opendata.cern.chSource code for the CERN Open Data portal
Stars: ✭ 411 (-48.62%)
CouchdbSeamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
Stars: ✭ 5,166 (+545.75%)
Cogcomp NlpCogComp's Natural Language Processing libraries and Demos:
Stars: ✭ 410 (-48.75%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (-6.87%)
Decentralized InternetA SDK/library for decentralized web and distributing computing projects
Stars: ✭ 406 (-49.25%)
ArkimeArkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.
Stars: ✭ 4,994 (+524.25%)
IcebergIceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (-50.87%)
Kafka Streamsequivalent to kafka-streams 🐙 for nodejs ✨🐢🚀✨
Stars: ✭ 613 (-23.37%)
SkaleHigh performance distributed data processing engine
Stars: ✭ 390 (-51.25%)
Onlinestats.jlSingle-pass algorithms for statistics
Stars: ✭ 507 (-36.62%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+376.63%)
Rakam Api📈 Collect customer event data from your apps. (Note that this project only includes the API collector, not the visualization platform)
Stars: ✭ 772 (-3.5%)
Pgm Index🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Stars: ✭ 499 (-37.62%)
HiveApache Hive
Stars: ✭ 4,031 (+403.88%)
HalodbA fast, log structured key-value store.
Stars: ✭ 370 (-53.75%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-54.87%)
OozieMirror of Apache Oozie
Stars: ✭ 602 (-24.75%)
Stream FrameworkStream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
Stars: ✭ 4,576 (+472%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (-54.75%)
SylphStream computing platform for bigdata
Stars: ✭ 362 (-54.75%)
Fit SneFast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
Stars: ✭ 485 (-39.37%)
BigtopMirror of Apache Bigtop
Stars: ✭ 356 (-55.5%)
VespaThe open big data serving engine. https://vespa.ai
Stars: ✭ 3,747 (+368.38%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+589.13%)
RedisliteRedis in a python module.
Stars: ✭ 464 (-42%)
Devops RoadmapDevOps methodology & roadmap for a devops developer in 2019. Interesting books to learn new technologies.
Stars: ✭ 349 (-56.37%)
HazelcastOpen-source distributed computation and storage platform
Stars: ✭ 4,662 (+482.75%)
OapOptimized Analytics Package for Spark* Platform
Stars: ✭ 343 (-57.12%)