saisokuSaisoku is a Python module that helps you build complex pipelines of batch file/directory transfer/sync jobs.
Stars: ✭ 40 (-99.74%)
Hadoop CommonMirror of Apache Hadoop common
Stars: ✭ 155 (-98.98%)
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (-99.16%)
V2Staffjoy V2 - all microservices in a monorepo
Stars: ✭ 1,586 (-89.58%)
Tone.jsA Web Audio framework for making interactive music in the browser.
Stars: ✭ 11,352 (-25.44%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-99.17%)
Hive Jdbc Uber JarHive JDBC "uber" or "standalone" jar based on the latest Apache Hive version
Stars: ✭ 188 (-98.77%)
ZeebeDistributed Workflow Engine for Microservices Orchestration
Stars: ✭ 2,165 (-85.78%)
DataxDataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (-99.24%)
XlearningAI on Hadoop
Stars: ✭ 1,709 (-88.78%)
Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (-19.37%)
QuartziteQuarzite is a thin idiomatic Clojure layer on top the Quartz Scheduler
Stars: ✭ 194 (-98.73%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (-99.16%)
PrestoThe official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (-14.9%)
ShifuAn end-to-end machine learning and data mining framework on Hadoop
Stars: ✭ 207 (-98.64%)
DrillApache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (-89.37%)
Hadoop HdfsMirror of Apache Hadoop HDFS
Stars: ✭ 152 (-99%)
Taskpacker🎒 Simple schedule optimization library for Python
Stars: ✭ 115 (-99.24%)
OptaplannerAI constraint solver in Java to optimize the vehicle routing problem, employee rostering, task assignment, maintenance scheduling, conference scheduling and other planning problems.
Stars: ✭ 2,454 (-83.88%)
Parquet RsApache Parquet implementation in Rust
Stars: ✭ 144 (-99.05%)
Xlearning Xdmlextremely distributed machine learning
Stars: ✭ 113 (-99.26%)
Avro Hadoop StarterExample MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Stars: ✭ 110 (-99.28%)
Smart Industry🏭 Open Source Manufacturing Execution System for JobShop type manufacturer.
Stars: ✭ 138 (-99.09%)
PaiResource scheduling and cluster management for AI
Stars: ✭ 2,223 (-85.4%)
Awesome Learning实践源码库:https://github.com/jast90/bigdata 。 微信搜索Jast关注公众号,获取最新技术分享😯。
Stars: ✭ 197 (-98.71%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-98.93%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (-89.22%)
SpydraEphemeral Hadoop clusters using Google Compute Platform
Stars: ✭ 128 (-99.16%)
Bookstore📚 Notebook storage and publishing workflows for the masses
Stars: ✭ 162 (-98.94%)
HivedschedulerKubernetes Scheduler for Deep Learning
Stars: ✭ 126 (-99.17%)
NutchApache Nutch is an extensible and scalable web crawler
Stars: ✭ 2,277 (-85.05%)
Parquet4sRead and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Stars: ✭ 125 (-99.18%)
Cylc FlowCylc: a workflow engine for cycling systems. Repository master branch: core meta-scheduler component of cylc-8 (in development); Repository 7.8.x branch: full cylc-7 system.
Stars: ✭ 154 (-98.99%)
DynamometerA tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Stars: ✭ 122 (-99.2%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (-98.59%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-99.23%)
Movie recommend基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Stars: ✭ 2,092 (-86.26%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (-89.29%)
Unified Hosts AutoupdateQuickly and easily install, uninstall, and set up automatic updates for any of Steven Black's unified hosts files.
Stars: ✭ 185 (-98.78%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-99.01%)
AsakusafwAsakusa Framework
Stars: ✭ 114 (-99.25%)
Parquet GoGo package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Stars: ✭ 114 (-99.25%)
HadoopApache Hadoop
Stars: ✭ 12,177 (-20.02%)
Liteflowliteflow是一个基于任务版本来实现的分布式任务流调度系统
Stars: ✭ 112 (-99.26%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-98.84%)
Nn dataflowExplore the energy-efficient dataflow scheduling for neural networks.
Stars: ✭ 141 (-99.07%)
Hadoop ConnectorsLibraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
Stars: ✭ 218 (-98.57%)
CalciteApache Calcite
Stars: ✭ 2,816 (-81.51%)
SaturnThe vip.com's distributed job scheduling platform.
Stars: ✭ 2,141 (-85.94%)