Spark Infotheoretic Feature SelectionThis package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.
Stars: ✭ 123 (+623.53%)
HailScalable genomic data analysis.
Stars: ✭ 706 (+4052.94%)
yuzhouwanCode Library for My Blog
Stars: ✭ 39 (+129.41%)
ScriptisScriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+3994.12%)
DeequDeequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Stars: ✭ 2,020 (+11782.35%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+3623.53%)
Dev SetupmacOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.
Stars: ✭ 5,590 (+32782.35%)
DatafusionDataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (+3494.12%)
Mongo SparkThe MongoDB Spark Connector
Stars: ✭ 588 (+3358.82%)
TeddySpark Streaming监控平台,支持任务部署与告警、自启动
Stars: ✭ 120 (+605.88%)
SparklearningLearning Apache spark,including code and data .Most part can run local.
Stars: ✭ 558 (+3182.35%)
Spark Knnk-Nearest Neighbors algorithm on Spark
Stars: ✭ 205 (+1105.88%)
JustenoughscalaforsparkA tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.
Stars: ✭ 538 (+3064.71%)
ElassandraElassandra = Elasticsearch + Apache Cassandra
Stars: ✭ 1,610 (+9370.59%)
SpartaReal Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (+2917.65%)
spark-utillow-level helpers for Apache Spark libraries and tests
Stars: ✭ 16 (-5.88%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (+2882.35%)
Spark LucenerddSpark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (+570.59%)
Pdf编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
Stars: ✭ 12,009 (+70541.18%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+16952.94%)
Bdp Dataplatform大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Stars: ✭ 456 (+2582.35%)
Bigdataie大数据博客、笔试题、教程、项目、面经的整理
Stars: ✭ 445 (+2517.65%)
spark-demosCollection of different demo applications using Apache Spark
Stars: ✭ 15 (-11.76%)
Python BigdataData science and Big Data with Python
Stars: ✭ 112 (+558.82%)
Dji Firmware ToolsTools for handling firmwares of DJI products, with focus on quadcopters.
Stars: ✭ 424 (+2394.12%)
BallistaDistributed compute platform implemented in Rust, and powered by Apache Arrow.
Stars: ✭ 2,274 (+13276.47%)
FeatranA Scala feature transformation library for data science and machine learning
Stars: ✭ 420 (+2370.59%)
ElephasDistributed Deep learning with Keras & Spark
Stars: ✭ 1,521 (+8847.06%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+2329.41%)
WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+10817.65%)
MarmarayGeneric Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (+2335.29%)
Js SparkRealtime calculation distributed system. AKA distributed lodash
Stars: ✭ 187 (+1000%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+2288.24%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (+547.06%)
Covid19TrackerA Robinhood style COVID-19 🦠 Android tracking app for the US. Open source and built with Kotlin.
Stars: ✭ 65 (+282.35%)
RedashMake Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Stars: ✭ 20,147 (+118411.76%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+22329.41%)
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (+976.47%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+2088.24%)
HnswlibJava library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Stars: ✭ 108 (+535.29%)
SparkmeasureThis is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
Stars: ✭ 368 (+2064.71%)
spark-druid-olapSparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 286 (+1582.35%)
Seldon ServerMachine Learning Platform and Recommendation Engine built on Kubernetes
Stars: ✭ 1,435 (+8341.18%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (+47.06%)
dllibdllib is a distributed deep learning library running on Apache Spark
Stars: ✭ 32 (+88.24%)
prostoProsto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (+217.65%)
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (+47.06%)
DparkPython clone of Spark, a MapReduce alike framework in Python
Stars: ✭ 2,668 (+15594.12%)
Nd4jFast, Scientific and Numerical Computing for the JVM (NDArrays)
Stars: ✭ 1,742 (+10147.06%)