Ytk LearnYtk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Stars: ✭ 337 (+316.05%)
Spark BigqueryGoogle BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Stars: ✭ 65 (-19.75%)
Spark NkpNatural Korean Processor for Apache Spark
Stars: ✭ 50 (-38.27%)
ThingsboardOpen-source IoT Platform - Device management, data collection, processing and visualization.
Stars: ✭ 10,526 (+12895.06%)
Spark Submit UiThis is a based on playframwork for submit spark app
Stars: ✭ 53 (-34.57%)
Spark TdaSparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
Stars: ✭ 45 (-44.44%)
GatkOfficial code repository for GATK versions 4 and up
Stars: ✭ 1,002 (+1137.04%)
Fast MrmrAn improved implementation of the classical feature selection method: minimum Redundancy and Maximum Relevance (mRMR).
Stars: ✭ 67 (-17.28%)
Awesome PulsarA curated list of Pulsar tools, integrations and resources.
Stars: ✭ 57 (-29.63%)
Apache Spark Hands OnEducational notes,Hands on problems w/ solutions for hadoop ecosystem
Stars: ✭ 74 (-8.64%)
Docker HadoopA Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Stars: ✭ 54 (-33.33%)
Coursera Ml PyPython programming assignments for Machine Learning by Prof. Andrew Ng in Coursera
Stars: ✭ 1,140 (+1307.41%)
BrihaspatiCollection of various implementations and Codes in Machine Learning, Deep Learning and Computer Vision ✨💥
Stars: ✭ 53 (-34.57%)
Awesome Recommendation EngineThe purpose of this tiny project is to put things together with the know how that i learned from the course big data expert from formacionhadoop.com The idea is to show how to play with apache spark streaming, kafka,mongo, spark machine learning algorithms.
Stars: ✭ 47 (-41.98%)
Delta ArchitectureStreaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Stars: ✭ 43 (-46.91%)
Kamu CliNext generation tool for decentralized exchange and transformation of semi-structured data
Stars: ✭ 69 (-14.81%)
Silexsomething to help you spark
Stars: ✭ 61 (-24.69%)
PixiedustPython Helper library for Jupyter Notebooks
Stars: ✭ 998 (+1132.1%)
Dist LrA distributed logistic regression system based on ps-lite.
Stars: ✭ 39 (-51.85%)
Zemberek Nlp ServerZemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu
Stars: ✭ 60 (-25.93%)
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-28.4%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+1375.31%)
KontextfreiWriting application logic for Spark jobs that can be unit-tested without a SparkContext
Stars: ✭ 67 (-17.28%)
Ds and ml projectsData Science & Machine Learning projects and tutorials in python from beginner to advanced level.
Stars: ✭ 56 (-30.86%)
HomeApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Stars: ✭ 1,199 (+1380.25%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-32.1%)
DeeplearningDeep Learning From Scratch
Stars: ✭ 66 (-18.52%)
Utils4sscala、spark使用过程中,各种测试用例以及相关资料整理
Stars: ✭ 1,070 (+1220.99%)
Lpa DetectorOptimize and improve the Label propagation algorithm
Stars: ✭ 75 (-7.41%)
25daysinmachinelearningI will update this repository to learn Machine learning with python with statistics content and materials
Stars: ✭ 53 (-34.57%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-19.75%)
Docker Spark🚢 Docker image for Apache Spark
Stars: ✭ 78 (-3.7%)
W2vWord2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-20.99%)
LabsResearch on distributed system
Stars: ✭ 73 (-9.88%)
Zfverify正方验证码识别工具 提供多种方式
Stars: ✭ 44 (-45.68%)
Pysparkgeoanalysis🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-22.22%)
Cleanframestype-class based data cleansing library for Apache Spark SQL
Stars: ✭ 75 (-7.41%)
RoffildlibraryLibrary for MQL5 (MetaTrader) with Python, Java, Apache Spark, AWS
Stars: ✭ 63 (-22.22%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (-11.11%)
SnappydataProject SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
Stars: ✭ 995 (+1128.4%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-25.93%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-2.47%)
Ml codeA repository for recording the machine learning code
Stars: ✭ 75 (-7.41%)
Ds CheatsheetsList of Data Science Cheatsheets to rule the world
Stars: ✭ 9,452 (+11569.14%)