FiliEasily make RESTful web services for time series reporting with Big Data analytics engines like Druid and SQL Databases.
Stars: ✭ 151 (-29.77%)
Mobydq🐳 Tool to automate data quality checks on data pipelines
Stars: ✭ 123 (-42.79%)
Flink ShadedApache Flink shaded artifacts repository
Stars: ✭ 67 (-68.84%)
DvidDistributed, Versioned, Image-oriented Dataservice
Stars: ✭ 174 (-19.07%)
Spark Infotheoretic Feature SelectionThis package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.
Stars: ✭ 123 (-42.79%)
Spark BigqueryGoogle BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Stars: ✭ 65 (-69.77%)
Spark TsneDistributed t-SNE via Apache Spark
Stars: ✭ 151 (-29.77%)
W2vWord2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-70.23%)
DynamometerA tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Stars: ✭ 122 (-43.26%)
Pysparkgeoanalysis🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-70.7%)
RoffildlibraryLibrary for MQL5 (MetaTrader) with Python, Java, Apache Spark, AWS
Stars: ✭ 63 (-70.7%)
Benchm MlA minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (+753.49%)
WarpConvert and analyze large data sets at light speed, on Mac and iOS.
Stars: ✭ 62 (-71.16%)
DeequDeequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Stars: ✭ 2,020 (+839.53%)
NabhashAn extremely fast Non-crypto-safe AES Based Hash algorithm for Big Data
Stars: ✭ 62 (-71.16%)
ZparkioBoiler plate framework to use Spark and ZIO together.
Stars: ✭ 121 (-43.72%)
Silexsomething to help you spark
Stars: ✭ 61 (-71.63%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+1071.16%)
HudiUpserts, Deletes And Incremental Processing on Big Data.
Stars: ✭ 2,586 (+1102.79%)
Zemberek Nlp ServerZemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu
Stars: ✭ 60 (-72.09%)
VerticapyVerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
Stars: ✭ 59 (-72.56%)
LikelikeAn implementation of locality sensitive hashing with Hadoop
Stars: ✭ 58 (-73.02%)
Attic LensMirror of Apache Lens
Stars: ✭ 58 (-73.02%)
AztkAZTK powered by Azure Batch: On-demand, Dockerized, Spark Jobs on Azure
Stars: ✭ 152 (-29.3%)
SigmfThe Signal Metadata Format Specification
Stars: ✭ 120 (-44.19%)
YmcacheYMCache is a lightweight object caching solution for iOS and Mac OS X that is designed for highly parallel access scenarios.
Stars: ✭ 58 (-73.02%)
Pyspark ExamplesCode examples on Apache Spark using python
Stars: ✭ 58 (-73.02%)
TeddySpark Streaming监控平台,支持任务部署与告警、自启动
Stars: ✭ 120 (-44.19%)
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-73.02%)
Data Science Live BookAn open source book to learn data science, data analysis and machine learning, suitable for all ages!
Stars: ✭ 193 (-10.23%)
KeyviKeyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.
Stars: ✭ 171 (-20.47%)
Kinesis SqlKinesis Connector for Structured Streaming
Stars: ✭ 120 (-44.19%)
AthenacliAthenaCLI is a CLI tool for AWS Athena service that can do auto-completion and syntax highlighting.
Stars: ✭ 151 (-29.77%)
ElassandraElassandra = Elasticsearch + Apache Cassandra
Stars: ✭ 1,610 (+648.84%)
Sparkit LearnPySpark + Scikit-learn = Sparkit-learn
Stars: ✭ 1,073 (+399.07%)
Kibble 1Apache Kibble - a tool to collect, aggregate and visualize data about any software project
Stars: ✭ 54 (-74.88%)
Attic PredictionioPredictionIO, a machine learning server for developers and ML engineers.
Stars: ✭ 12,522 (+5724.19%)
AlbedoA recommender system for discovering GitHub repos, built with Apache Spark
Stars: ✭ 149 (-30.7%)
Lifion KinesisA native Node.js producer and consumer library for Amazon Kinesis Data Streams
Stars: ✭ 54 (-74.88%)
Utils4sscala、spark使用过程中,各种测试用例以及相关资料整理
Stars: ✭ 1,070 (+397.67%)
Spark Submit UiThis is a based on playframwork for submit spark app
Stars: ✭ 53 (-75.35%)
Cube.js📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+5473.49%)
Macro mlCourse Website on Macroeconomic Analysis with Machine Learning and Big Data
Stars: ✭ 53 (-75.35%)
OodtMirror of Apache OODT
Stars: ✭ 52 (-75.81%)
Cc PysparkProcess Common Crawl data with Python and Spark
Stars: ✭ 147 (-31.63%)
CmakCMAK is a tool for managing Apache Kafka clusters
Stars: ✭ 10,544 (+4804.19%)
Awesome SparkA curated list of awesome Apache Spark packages and resources.
Stars: ✭ 1,061 (+393.49%)
Datumbox FrameworkDatumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
Stars: ✭ 1,063 (+394.42%)
DataxDataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (-46.05%)
AvroApache Avro is a data serialization system.
Stars: ✭ 2,005 (+832.56%)
Amazon S3 Find And ForgetAmazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (-46.51%)