Mara Example Project 2An example mini data warehouse for python project stats, template for new projects
Stars: ✭ 154 (-29.36%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-42.2%)
DrillApache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+642.66%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+653.21%)
DynamometerA tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Stars: ✭ 122 (-44.04%)
Hive Jdbc Uber JarHive JDBC "uber" or "standalone" jar based on the latest Apache Hive version
Stars: ✭ 188 (-13.76%)
HadoopApache Hadoop
Stars: ✭ 12,177 (+5485.78%)
AsakusafwAsakusa Framework
Stars: ✭ 114 (-47.71%)
Avro Hadoop StarterExample MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Stars: ✭ 110 (-49.54%)
Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+5531.65%)
SpydraEphemeral Hadoop clusters using Google Compute Platform
Stars: ✭ 128 (-41.28%)
GeomancerAutomated feature engineering for geospatial data
Stars: ✭ 194 (-11.01%)
Spark Bigquery ConnectorBigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Stars: ✭ 126 (-42.2%)
PrestoThe official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+5843.58%)
BeastLoad data from Kafka to any data warehouse
Stars: ✭ 119 (-45.41%)
ShifuAn end-to-end machine learning and data mining framework on Hadoop
Stars: ✭ 207 (-5.05%)
Cube.js📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+5396.79%)
Hadoop HdfsMirror of Apache Hadoop HDFS
Stars: ✭ 152 (-30.28%)
Parquet GoGo package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Stars: ✭ 114 (-47.71%)
QuixQuix Notebook Manager
Stars: ✭ 184 (-15.6%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (-35.78%)
WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+751.38%)
Go BqstreamerStream data into Google BigQuery concurrently using InsertAll()
Stars: ✭ 133 (-38.99%)
Bitcoin EtlETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 174 (-20.18%)
Awesome Learning实践源码库:https://github.com/jast90/bigdata 。 微信搜索Jast关注公众号,获取最新技术分享😯。
Stars: ✭ 197 (-9.63%)
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (-41.28%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-25.23%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (-41.28%)
MproveOpen source Business Intelligence tool 🎉
Stars: ✭ 212 (-2.75%)
Parquet4sRead and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Stars: ✭ 125 (-42.66%)
Gpt2 Bert Reddit Bota bot that generates realistic replies using a combination of pretrained GPT-2 and BERT models
Stars: ✭ 158 (-27.52%)
MaisUniversalizando o acesso a dados no Brasil. Docs: https://basedosdados.github.io/mais/
Stars: ✭ 122 (-44.04%)
NutchApache Nutch is an extensible and scalable web crawler
Stars: ✭ 2,277 (+944.5%)
Professional ServicesCommon solutions and tools developed by Google Cloud's Professional Services team
Stars: ✭ 1,923 (+782.11%)
Hadoop CommonMirror of Apache Hadoop common
Stars: ✭ 155 (-28.9%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-46.33%)
CalciteApache Calcite
Stars: ✭ 2,816 (+1191.74%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+647.71%)
Movie recommend基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Stars: ✭ 2,092 (+859.63%)
DataxDataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (-46.79%)
Bigquery GrafanaGoogle BigQuery Datasource Plugin for Grafana.
Stars: ✭ 188 (-13.76%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-31.19%)
Xlearning Xdmlextremely distributed machine learning
Stars: ✭ 113 (-48.17%)
Parquet RsApache Parquet implementation in Rust
Stars: ✭ 144 (-33.94%)
Haproxy Configs80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.
Stars: ✭ 106 (-51.38%)
ScioA Scala API for Apache Beam and Google Cloud Dataflow.
Stars: ✭ 2,247 (+930.73%)
XlearningAI on Hadoop
Stars: ✭ 1,709 (+683.94%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (-1.38%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-18.81%)