gomrjobgomrjob - a Go Framework for Hadoop Map Reduce Jobs
Stars: ✭ 39 (-73.29%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+15001.37%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+47.26%)
railScalable RNA-seq analysis
Stars: ✭ 74 (-49.32%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-76.71%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-23.97%)
SrcA light-weight distributed stream computing framework for Golang
Stars: ✭ 67 (-54.11%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-73.29%)
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-78.08%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+1078.77%)
BehemothBehemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Stars: ✭ 286 (+95.89%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-91.1%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-82.88%)
Avro Hadoop StarterExample MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Stars: ✭ 110 (-24.66%)
connected-componentMap Reduce Implementation of Connected Component on Apache Spark
Stars: ✭ 68 (-53.42%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+21.23%)
Bigdata💎🔥大数据学习笔记
Stars: ✭ 488 (+234.25%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+486.99%)
AsakusafwAsakusa Framework
Stars: ✭ 114 (-21.92%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-36.99%)
Dist KerasDistributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Stars: ✭ 613 (+319.86%)
MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+536.3%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (-12.33%)
CascadingCascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.
Stars: ✭ 318 (+117.81%)
Data Algorithms Book MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+550%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-83.56%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+2.74%)
fink-brokerAstronomy Broker based on Apache Spark
Stars: ✭ 18 (-87.67%)
docker-hadoopDocker image for main Apache Hadoop components (Yarn/Hdfs)
Stars: ✭ 59 (-59.59%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+1533.56%)
streamsx.kafkaRepository for integration with Apache Kafka
Stars: ✭ 13 (-91.1%)
terasliceScalable data processing pipelines in JavaScript
Stars: ✭ 48 (-67.12%)
OpenemrThe most popular open source electronic health records and medical practice management solution.
Stars: ✭ 1,762 (+1106.85%)
JavaFrameworkSimple Java Framework,designed for easily develop Spring based java program.Support Bigdata And metadata management.A common elasticsearch comm query tool and so on.
Stars: ✭ 16 (-89.04%)
freehealthFree and open source Electronic Health Record
Stars: ✭ 39 (-73.29%)
openPDCOpen Source Phasor Data Concentrator
Stars: ✭ 109 (-25.34%)
beanszooDistributed Java micro-services using ZooKeeper
Stars: ✭ 12 (-91.78%)
sensu-plugins-awsThis plugin provides native AWS instrumentation for monitoring and metrics collection, including: health and metrics for various AWS services, such as EC2, RDS, ELB, and more, as well as handlers for EC2, SES, and SNS.
Stars: ✭ 79 (-45.89%)
healthcareOpen Source Healthcare ERP / Management System
Stars: ✭ 68 (-53.42%)
orionManagement and automation platform for Stateful Distributed Systems
Stars: ✭ 77 (-47.26%)
tschartsDjango REST framework-based Digital Patient Registration and EMR backend
Stars: ✭ 14 (-90.41%)
pdd-graphPDD Graph : Bridging MIMIC-III and Linked Data Cloud
Stars: ✭ 31 (-78.77%)
hadoop-ansibleInstall hadoop cluster with ansible
Stars: ✭ 35 (-76.03%)
sbt-lighterSBT plugin for Apache Spark on AWS EMR
Stars: ✭ 57 (-60.96%)
KeywordAnalysisWord analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends
Stars: ✭ 49 (-66.44%)
terraform-emr-spark-exampleAn example Terraform project that will configure a Secure and Customizable Spark Cluster on Amazon EMR.
Stars: ✭ 43 (-70.55%)
awesome-toolscurated list of awesome tools and libraries for specific domains
Stars: ✭ 31 (-78.77%)
webhdfsNode.js WebHDFS REST API client
Stars: ✭ 88 (-39.73%)
TonYTonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Stars: ✭ 687 (+370.55%)
isarn-sketches-sparkRoutines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-80.82%)
appAplicación web para ANDES
Stars: ✭ 12 (-91.78%)