Hops ExamplesExamples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (-12.5%)
Spark NkpNatural Korean Processor for Apache Spark
Stars: ✭ 50 (-47.92%)
LabsResearch on distributed system
Stars: ✭ 73 (-23.96%)
Awesome Recommendation EngineThe purpose of this tiny project is to put things together with the know how that i learned from the course big data expert from formacionhadoop.com The idea is to show how to play with apache spark streaming, kafka,mongo, spark machine learning algorithms.
Stars: ✭ 47 (-51.04%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-5.21%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (-25%)
Delta ArchitectureStreaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Stars: ✭ 43 (-55.21%)
Hadoop cookbookCookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (-14.58%)
GatkOfficial code repository for GATK versions 4 and up
Stars: ✭ 1,002 (+943.75%)
PixiedustPython Helper library for Jupyter Notebooks
Stars: ✭ 998 (+939.58%)
KontextfreiWriting application logic for Spark jobs that can be unit-tested without a SparkContext
Stars: ✭ 67 (-30.21%)
MleapMLeap: Deploy ML Pipelines to Production
Stars: ✭ 1,232 (+1183.33%)
Spark FlamegraphEasy CPU Profiling for Apache Spark applications
Stars: ✭ 30 (-68.75%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-32.29%)
Ammonite SparkRun spark calculations from Ammonite
Stars: ✭ 88 (-8.33%)
PucketBucketing and partitioning system for Parquet
Stars: ✭ 29 (-69.79%)
Spark GbtlrHybrid model of Gradient Boosting Trees and Logistic Regression (GBDT+LR) on Spark
Stars: ✭ 81 (-15.62%)
Rcnn Relation ExtractionTensorflow Implementation of Recurrent Convolutional Neural Network for Relation Extraction
Stars: ✭ 64 (-33.33%)
Data Algorithms Book MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+888.54%)
HeraclesHigh performance HBase / Spark SQL engine
Stars: ✭ 27 (-71.87%)
Tre[AKBC 19] Improving Relation Extraction by Pre-trained Language Representations
Stars: ✭ 95 (-1.04%)
Big Data🔧 Use dplyr to analyze Big Data 🐘
Stars: ✭ 93 (-3.12%)
Spark Nlp ModelsModels and Pipelines for the Spark NLP library
Stars: ✭ 88 (-8.33%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-17.71%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+32835.42%)
RexREx: Relation Extraction. Modernized re-write of the code in the master's thesis: "Relation Extraction using Distant Supervision, SVMs, and Probabalistic First-Order Logic"
Stars: ✭ 21 (-78.12%)
Pytorch NreNeural Relation Extraction in Pytorch
Stars: ✭ 20 (-79.17%)
FlintA Time Series Library for Apache Spark
Stars: ✭ 878 (+814.58%)
Docker Spark🚢 Docker image for Apache Spark
Stars: ✭ 78 (-18.75%)
RoffildlibraryLibrary for MQL5 (MetaTrader) with Python, Java, Apache Spark, AWS
Stars: ✭ 63 (-34.37%)
TedsdsApache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark
Stars: ✭ 14 (-85.42%)
Silexsomething to help you spark
Stars: ✭ 61 (-36.46%)
UrhoxUrho3D extension library
Stars: ✭ 13 (-86.46%)
HomeApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Stars: ✭ 1,199 (+1148.96%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-37.5%)
MlfeatureFeature engineering toolkit for Spark MLlib.
Stars: ✭ 12 (-87.5%)
MareMaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.
Stars: ✭ 11 (-88.54%)
SparkjniA heterogeneous Apache Spark framework.
Stars: ✭ 11 (-88.54%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+792.71%)
Distre[ACL 19] Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction
Stars: ✭ 75 (-21.87%)
Zemberek Nlp ServerZemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu
Stars: ✭ 60 (-37.5%)
Knowledge GraphsA collection of research on knowledge graphs
Stars: ✭ 845 (+780.21%)
Dockerfiles50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (+782.29%)
PetastormPetastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Stars: ✭ 1,108 (+1054.17%)
Tiledb VcfEfficient variant-call data storage and retrieval library using the TileDB storage library.
Stars: ✭ 26 (-72.92%)
Spark SwaggerSpark (http://sparkjava.com/) support for Swagger (https://swagger.io/)
Stars: ✭ 25 (-73.96%)