GiraphMirror of Apache Giraph
Stars: ✭ 569 (-64.85%)
WarpConvert and analyze large data sets at light speed, on Mac and iOS.
Stars: ✭ 62 (-96.17%)
ScannerEfficient video analysis at scale
Stars: ✭ 569 (-64.85%)
Beezig🐝 Beezig - The Hive plugin for 5zig.
Stars: ✭ 16 (-99.01%)
KglabGraph-Based Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, RDFlib, pySHACL, RAPIDS, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc.
Stars: ✭ 98 (-93.95%)
Hadoop study定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
Stars: ✭ 567 (-64.98%)
NabhashAn extremely fast Non-crypto-safe AES Based Hash algorithm for Big Data
Stars: ✭ 62 (-96.17%)
PachydermReproducible Data Science at Scale!
Stars: ✭ 5,305 (+227.67%)
elm-drill手を動かしながら Elm に慣れるためのドリルです。
Stars: ✭ 47 (-97.1%)
SqoopMirror of Apache Sqoop
Stars: ✭ 817 (-49.54%)
TitanoboaTitanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Stars: ✭ 787 (-51.39%)
NipypeWorkflows and interfaces for neuroimaging packages
Stars: ✭ 557 (-65.6%)
spark-utillow-level helpers for Apache Spark libraries and tests
Stars: ✭ 16 (-99.01%)
StormMirror of Apache Storm
Stars: ✭ 6,297 (+288.94%)
alluxio-pyAlluxio Python client - Access Any Data Source with Python
Stars: ✭ 18 (-98.89%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-95.74%)
mutant-swarmMutation testing framework and code coverage for Hive SQL
Stars: ✭ 20 (-98.76%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (-53.98%)
SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (-94.01%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-96.29%)
CouchdbSeamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
Stars: ✭ 5,166 (+219.09%)
meepo异构存储数据迁移
Stars: ✭ 29 (-98.21%)
CythonThe most widely used Python to C compiler
Stars: ✭ 6,588 (+306.92%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-98.76%)
AtsdAxibase Time Series Database Documentation
Stars: ✭ 68 (-95.8%)
pyparEfficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.
Stars: ✭ 66 (-95.92%)
ScriptisScriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (-57.01%)
jumbo🐘 A local Hadoop cluster bootstrapper using Vagrant, Ansible, and Ambari.
Stars: ✭ 17 (-98.95%)
SamzaMirror of Apache Samza
Stars: ✭ 676 (-58.25%)
pytorch kmeansImplementation of the k-means algorithm in PyTorch that works for large datasets
Stars: ✭ 38 (-97.65%)
nimble-orm一个灵活轻量级的基于Spring jdbcTemplate的ORM
Stars: ✭ 36 (-97.78%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (-17.36%)
liferay-portal-oracledb-supportLiferay Portal 7 Community Edition Oracle Database Support ** NO LONGER MAINTAINED **. Refer to this repository: https://github.com/amusarra/liferay-portal-database-all-in-one-support
Stars: ✭ 13 (-99.2%)
TonyTonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Stars: ✭ 626 (-61.33%)
hyper-enginePython library for Bayesian hyper-parameters optimization
Stars: ✭ 80 (-95.06%)
SrcA light-weight distributed stream computing framework for Golang
Stars: ✭ 67 (-95.86%)
ThrillThrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
Stars: ✭ 528 (-67.39%)
Pythondatarepo for code published on pythondata.com
Stars: ✭ 113 (-93.02%)
Java JdbcOpenTracing Instrumentation for JDBC
Stars: ✭ 60 (-96.29%)
RagtimeDatabase-independent migration library
Stars: ✭ 519 (-67.94%)
dlux open tokenDLUX distributed deterministic finite state automata. Built for HIVE to take advantage of free transactions using multi-sig and escrow for security.
Stars: ✭ 16 (-99.01%)
v6.dooring.public可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Stars: ✭ 323 (-80.05%)
ArkimeArkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.
Stars: ✭ 4,994 (+208.46%)
PetastormPetastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Stars: ✭ 1,108 (-31.56%)
BeamApache Beam is a unified programming model for Batch and Streaming
Stars: ✭ 5,149 (+218.04%)
Drone🍰 The missing library manager for Android Developers
Stars: ✭ 512 (-68.38%)
ReefMirror of Apache REEF
Stars: ✭ 92 (-94.32%)
VerticapyVerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
Stars: ✭ 59 (-96.36%)
Onlinestats.jlSingle-pass algorithms for statistics
Stars: ✭ 507 (-68.68%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (-68.68%)
LikelikeAn implementation of locality sensitive hashing with Hadoop
Stars: ✭ 58 (-96.42%)