Parquet4sRead and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Stars: ✭ 125 (-19.35%)
Dockerfiles50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (+446.45%)
Wifi基于wifi抓取信息的大数据查询分析系统
Stars: ✭ 93 (-40%)
Stormtweetssentimentd3vizComputes and visualizes the sentiment analysis of tweets of US States in real-time using Storm.
Stars: ✭ 25 (-83.87%)
Hadoop For GeoeventArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-96.77%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-24.52%)
Winutilswinutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows
Stars: ✭ 657 (+323.87%)
TonyTonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Stars: ✭ 626 (+303.87%)
HadoopApache Hadoop
Stars: ✭ 12,177 (+7756.13%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+3549.03%)
AlluxioAlluxio, data orchestration for analytics and machine learning in the cloud
Stars: ✭ 5,379 (+3370.32%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+951.61%)
Bigdata💎🔥大数据学习笔记
Stars: ✭ 488 (+214.84%)
ChukwaMirror of Apache Chukwa
Stars: ✭ 77 (-50.32%)
School Of SreAt LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
Stars: ✭ 5,141 (+3216.77%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+14124.52%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+670.97%)
MarmarayGeneric Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (+167.1%)
AsakusafwAsakusa Framework
Stars: ✭ 114 (-26.45%)
Apache Spark Hands OnEducational notes,Hands on problems w/ solutions for hadoop ecosystem
Stars: ✭ 74 (-52.26%)
IcebergIceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+153.55%)
Hadoop HdfsMirror of Apache Hadoop HDFS
Stars: ✭ 152 (-1.94%)
IgniteApache Ignite
Stars: ✭ 4,027 (+2498.06%)
AtsdAxibase Time Series Database Documentation
Stars: ✭ 68 (-56.13%)
HiveApache Hive
Stars: ✭ 4,031 (+2500.65%)
Parquet GoGo package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Stars: ✭ 114 (-26.45%)
Ytk LearnYtk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Stars: ✭ 337 (+117.42%)
JumbuneJumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
Stars: ✭ 64 (-58.71%)
Gather DeploymentGathers scalable tensorflow and infrastructure deployment
Stars: ✭ 326 (+110.32%)
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (-17.42%)
TezApache Tez
Stars: ✭ 313 (+101.94%)
LikelikeAn implementation of locality sensitive hashing with Hadoop
Stars: ✭ 58 (-62.58%)
SplineData Lineage Tracking And Visualization Solution
Stars: ✭ 306 (+97.42%)
Avro Hadoop StarterExample MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Stars: ✭ 110 (-29.03%)
ElasticlusterCreate clusters of VMs on the cloud and configure them with Ansible.
Stars: ✭ 298 (+92.26%)
Docker HadoopA Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Stars: ✭ 54 (-65.16%)
Android NosqlLightweight, simple structured NoSQL database for Android
Stars: ✭ 284 (+83.23%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (-9.68%)
Hadoop Mini Clustershadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE
Stars: ✭ 265 (+70.97%)
Basehttps://www.researchgate.net/profile/Rajah_Iyer
Stars: ✭ 48 (-69.03%)
pulsephData Pulse application log aggregation and monitoring
Stars: ✭ 13 (-91.61%)
WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+1097.42%)
knitDeprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
Stars: ✭ 53 (-65.81%)
Nagios Plugins450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (+545.16%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-90.97%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (-17.42%)
Movie recommend基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Stars: ✭ 2,092 (+1249.68%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-3.23%)
XlearningAI on Hadoop
Stars: ✭ 1,709 (+1002.58%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-18.71%)
Data Algorithms Book MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+512.26%)