Bitcoin EtlETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 174 (-20.18%)
Ethereum Etl AirflowAirflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. What datasets do you want to be added to Ethereum ETL? Vote here: https://blockchain-etl.convas.io.
Stars: ✭ 89 (-59.17%)
Hadoop cookbookCookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (-62.39%)
Awesome Learning实践源码库:https://github.com/jast90/bigdata 。 微信搜索Jast关注公众号,获取最新技术分享😯。
Stars: ✭ 197 (-9.63%)
CamusMirror of Linkedin's Camus
Stars: ✭ 81 (-62.84%)
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (-41.28%)
Docker Spark🚢 Docker image for Apache Spark
Stars: ✭ 78 (-64.22%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-25.23%)
Tf YarnTrain TensorFlow models on YARN in just a few lines of code!
Stars: ✭ 76 (-65.14%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (-41.28%)
Docker HadoopApache Hadoop docker image
Stars: ✭ 1,190 (+445.87%)
MproveOpen source Business Intelligence tool 🎉
Stars: ✭ 212 (-2.75%)
Parquet4sRead and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Stars: ✭ 125 (-42.66%)
Sql RunnerRun templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake
Stars: ✭ 68 (-68.81%)
Gpt2 Bert Reddit Bota bot that generates realistic replies using a combination of pretrained GPT-2 and BERT models
Stars: ✭ 158 (-27.52%)
SrcA light-weight distributed stream computing framework for Golang
Stars: ✭ 67 (-69.27%)
MaisUniversalizando o acesso a dados no Brasil. Docs: https://basedosdados.github.io/mais/
Stars: ✭ 122 (-44.04%)
JumbuneJumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
Stars: ✭ 64 (-70.64%)
NutchApache Nutch is an extensible and scalable web crawler
Stars: ✭ 2,277 (+944.5%)
LikelikeAn implementation of locality sensitive hashing with Hadoop
Stars: ✭ 58 (-73.39%)
Professional ServicesCommon solutions and tools developed by Google Cloud's Professional Services team
Stars: ✭ 1,923 (+782.11%)
Docker HadoopA Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Stars: ✭ 54 (-75.23%)
Hadoop CommonMirror of Apache Hadoop common
Stars: ✭ 155 (-28.9%)
Hadoop SolrCode to index HDFS to Solr using MapReduce
Stars: ✭ 51 (-76.61%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-46.33%)
MoosefsMooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+370.18%)
CalciteApache Calcite
Stars: ✭ 2,816 (+1191.74%)
Nagios Plugins450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (+358.72%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+647.71%)
Movie recommend基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Stars: ✭ 2,092 (+859.63%)
Ethereum EtlPython scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 956 (+338.53%)
DataxDataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (-46.79%)
Pg2bqExport PostgreSQL tables to Google BigQuery
Stars: ✭ 30 (-86.24%)
Bigquery GrafanaGoogle BigQuery Datasource Plugin for Grafana.
Stars: ✭ 188 (-13.76%)
Storm Camel ExampleReal-time analysis and visualization with Storm-AMQ-Camel-Websockets-Highcharts integration.
Stars: ✭ 28 (-87.16%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-31.19%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+293.12%)
Xlearning Xdmlextremely distributed machine learning
Stars: ✭ 113 (-48.17%)
Hadoop PotA scalable Apache Hadoop-based implementation of the Pooled Time Series video similarity algorithm based on M. Ryoo et al paper CVPR 2015.
Stars: ✭ 8 (-96.33%)
KyloKylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Stars: ✭ 916 (+320.18%)
Parquet RsApache Parquet implementation in Rust
Stars: ✭ 144 (-33.94%)
Hadoop For GeoeventArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-97.71%)
Haproxy Configs80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.
Stars: ✭ 106 (-51.38%)
Winutilswinutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows
Stars: ✭ 657 (+201.38%)
ScioA Scala API for Apache Beam and Google Cloud Dataflow.
Stars: ✭ 2,247 (+930.73%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (-1.38%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-18.81%)