Graph-Based Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, RDFlib, pySHACL, RAPIDS, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc.

Stars: ✭ 98 (-93.95%)

Mutual labels: parquet

Hadoop study

定期更新Hadoop生态圈中常用大数据组件文档重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图印象笔记 Scala版本简单demo 常用工具类去敏后的train code 持续更新!!!)

Stars: ✭ 567 (-64.98%)

Mutual labels: hadoop

Nabhash

An extremely fast Non-crypto-safe AES Based Hash algorithm for Big Data

Stars: ✭ 62 (-96.17%)

Mutual labels: big-data

Pachyderm

Reproducible Data Science at Scale!

Stars: ✭ 5,305 (+227.67%)

Mutual labels: big-data

elm-drill

手を動かしながら Elm に慣れるためのドリルです。

Stars: ✭ 47 (-97.1%)

Mutual labels: drill

Sqoop

Mirror of Apache Sqoop

Stars: ✭ 817 (-49.54%)

Mutual labels: big-data

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-95.61%)

Mutual labels: big-data

AverageShiftedHistograms.jl

⚡ Lightning fast density estimation in Julia ⚡

Stars: ✭ 52 (-96.79%)

Mutual labels: big-data

Titanoboa

Titanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.

Stars: ✭ 787 (-51.39%)

Mutual labels: big-data

Nipype

Workflows and interfaces for neuroimaging packages

Stars: ✭ 557 (-65.6%)

Mutual labels: big-data

spark-util

low-level helpers for Apache Spark libraries and tests

Stars: ✭ 16 (-99.01%)

Mutual labels: hadoop

Storm

Mirror of Apache Storm

Stars: ✭ 6,297 (+288.94%)

Mutual labels: big-data

alluxio-py

Alluxio Python client - Access Any Data Source with Python

Stars: ✭ 18 (-98.89%)

Mutual labels: big-data

Countly Sdk Cordova

Countly Product Analytics SDK for Cordova, Icenium and Phonegap

Stars: ✭ 69 (-95.74%)

Mutual labels: big-data

mutant-swarm

Mutation testing framework and code coverage for Hive SQL

Stars: ✭ 20 (-98.76%)

Mutual labels: hive

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (-53.98%)

Mutual labels: big-data

Schemer

Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.

Stars: ✭ 97 (-94.01%)

Mutual labels: parquet

Waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.

Stars: ✭ 60 (-96.29%)

Mutual labels: hadoop

Couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability

Stars: ✭ 5,166 (+219.09%)

Mutual labels: big-data

meepo

异构存储数据迁移

Stars: ✭ 29 (-98.21%)

Mutual labels: parquet

Cython

The most widely used Python to C compiler

Stars: ✭ 6,588 (+306.92%)

Mutual labels: big-data

fastdata-cluster

Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)

Stars: ✭ 20 (-98.76%)

Mutual labels: hadoop

Atsd

Axibase Time Series Database Documentation

Stars: ✭ 68 (-95.8%)

Mutual labels: hadoop

pypar

Efficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.

Stars: ✭ 66 (-95.92%)

Mutual labels: big-data

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (-57.01%)

Mutual labels: hive

LPU-Java-2022-1

LPU Java JEE Sessions 2022 Batch 1

Stars: ✭ 30 (-98.15%)

Mutual labels: jdbc

Attic Predictionio Sdk Java

PredictionIO Java SDK

Stars: ✭ 107 (-93.39%)

Mutual labels: big-data

jumbo

🐘 A local Hadoop cluster bootstrapper using Vagrant, Ansible, and Ambari.

Stars: ✭ 17 (-98.95%)

Mutual labels: hadoop

Samza

Mirror of Apache Samza

Stars: ✭ 676 (-58.25%)

Mutual labels: big-data

pytorch kmeans

Implementation of the k-means algorithm in PyTorch that works for large datasets

Stars: ✭ 38 (-97.65%)

Mutual labels: big-data

Hazelcast Cpp Client

Hazelcast IMDG C++ Client

Stars: ✭ 67 (-95.86%)

Mutual labels: big-data

nimble-orm

一个灵活轻量级的基于Spring jdbcTemplate的ORM

Stars: ✭ 36 (-97.78%)

Mutual labels: jdbc

Useractionanalyzeplatform

电商用户行为分析大数据平台

Stars: ✭ 645 (-60.16%)

Mutual labels: hadoop

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (-17.36%)

Mutual labels: big-data

liferay-portal-oracledb-support

Liferay Portal 7 Community Edition Oracle Database Support ** NO LONGER MAINTAINED **. Refer to this repository: https://github.com/amusarra/liferay-portal-database-all-in-one-support

Stars: ✭ 13 (-99.2%)

Mutual labels: jdbc

Tony

TonY is a framework to natively run deep learning frameworks on Apache Hadoop.

Stars: ✭ 626 (-61.33%)

Mutual labels: hadoop

hyper-engine

Python library for Bayesian hyper-parameters optimization

Stars: ✭ 80 (-95.06%)

Mutual labels: big-data

Src

A light-weight distributed stream computing framework for Golang

Stars: ✭ 67 (-95.86%)

Mutual labels: hadoop

Thrill

Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++

Stars: ✭ 528 (-67.39%)

Mutual labels: big-data

Pythondata

repo for code published on pythondata.com

Stars: ✭ 113 (-93.02%)

Mutual labels: big-data

Mysql perf analyzer

MySQL performance monitoring and analysis.

Stars: ✭ 1,423 (-12.11%)

Mutual labels: big-data

Hadoop Yarn Api Python Client

Python client for Hadoop® YARN API

Stars: ✭ 91 (-94.38%)

Mutual labels: hadoop

Java Jdbc

OpenTracing Instrumentation for JDBC

Stars: ✭ 60 (-96.29%)

Mutual labels: jdbc

Ragtime

Database-independent migration library

Stars: ✭ 519 (-67.94%)

Mutual labels: jdbc

dlux open token

DLUX distributed deterministic finite state automata. Built for HIVE to take advantage of free transactions using multi-sig and escrow for security.

Stars: ✭ 16 (-99.01%)

Mutual labels: hive

v6.dooring.public

可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.

Stars: ✭ 323 (-80.05%)

Mutual labels: big-data

Arkime

Arkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.

Stars: ✭ 4,994 (+208.46%)

Mutual labels: big-data

Petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Stars: ✭ 1,108 (-31.56%)

Mutual labels: parquet

Beam

Apache Beam is a unified programming model for Batch and Streaming

Stars: ✭ 5,149 (+218.04%)

Mutual labels: big-data

Drone

🍰 The missing library manager for Android Developers

Stars: ✭ 512 (-68.38%)

Mutual labels: hive

Reef

Mirror of Apache REEF

Stars: ✭ 92 (-94.32%)

Mutual labels: big-data

Verticapy

VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.

Stars: ✭ 59 (-96.36%)

Mutual labels: big-data

Onlinestats.jl

Single-pass algorithms for statistics

Stars: ✭ 507 (-68.68%)

Mutual labels: big-data

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (-68.68%)

Mutual labels: big-data

Likelike

An implementation of locality sensitive hashing with Hadoop

Stars: ✭ 58 (-96.42%)

Mutual labels: hadoop

361-420 of 824 similar projects