SparklyrR interface for Apache Spark
Stars: ✭ 775 (+6945.45%)
LopqTraining of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark.
Stars: ✭ 530 (+4718.18%)
SpartaReal Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (+4563.64%)
Coding Now学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
Stars: ✭ 750 (+6718.18%)
Bigdata💎🔥大数据学习笔记
Stars: ✭ 488 (+4336.36%)
SparkctrCTR prediction model based on spark(LR, GBDT, DNN)
Stars: ✭ 740 (+6627.27%)
EdgeExtreme-scale Discontinuous Galerkin Environment (EDGE)
Stars: ✭ 18 (+63.64%)
SparkCross-platform real-time collaboration client optimized for business and organizations.
Stars: ✭ 471 (+4181.82%)
Cdhprojecthadoop各组件使用,持续更新
Stars: ✭ 733 (+6563.64%)
Poliastropoliastro - 🚀 Astrodynamics in Python
Stars: ✭ 462 (+4100%)
Dockerfiles50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (+7600%)
FramelessExpressive types for Spark.
Stars: ✭ 717 (+6418.18%)
DatafusionDataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (+5454.55%)
Mlpackmlpack: a scalable C++ machine learning library --
Stars: ✭ 3,859 (+34981.82%)
Bigdataie大数据博客、笔试题、教程、项目、面经的整理
Stars: ✭ 445 (+3945.45%)
HailScalable genomic data analysis.
Stars: ✭ 706 (+6318.18%)
BoxxTool-box for efficient build and debug in Python. Especially for Scientific Computing and Computer Vision.
Stars: ✭ 429 (+3800%)
ReflowA language and runtime for distributed, incremental data processing in the cloud
Stars: ✭ 706 (+6318.18%)
DeepxdeDeep learning library for solving differential equations and more
Stars: ✭ 420 (+3718.18%)
Librmath.jsJavascript Pure Implementation of Statistical R "core" numerical libRmath.so
Stars: ✭ 425 (+3763.64%)
FeatranA Scala feature transformation library for data science and machine learning
Stars: ✭ 420 (+3718.18%)
SidekickHigh Performance HTTP Sidecar Load Balancer
Stars: ✭ 366 (+3227.27%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+3654.55%)
Ocaml OdepackBinding to the ODEPACK FORTRAN library
Stars: ✭ 6 (-45.45%)
MarmarayGeneric Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (+3663.64%)
Corral🐎 A serverless MapReduce framework written for AWS Lambda
Stars: ✭ 648 (+5790.91%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+3590.91%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+5654.55%)
AmgclC++ library for solving large sparse linear systems with algebraic multigrid method
Stars: ✭ 390 (+3445.45%)
Bigdataguide大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+7327.27%)
RedashMake Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Stars: ✭ 20,147 (+183054.55%)
VexclVexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP
Stars: ✭ 626 (+5590.91%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+34563.64%)
CoreThe core source repository for the Cherab project.
Stars: ✭ 26 (+136.36%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+3281.82%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+51318.18%)
SparkmeasureThis is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
Stars: ✭ 368 (+3245.45%)
ItkInsight Toolkit (ITK) -- Official Repository. ITK builds on a proven, spatially-oriented architecture for processing, segmentation, and registration of scientific images in two, three, or more dimensions.
Stars: ✭ 801 (+7181.82%)
LoopyA code generator for array-based code on CPUs and GPUs
Stars: ✭ 367 (+3236.36%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+50018.18%)
OwlOwl - OCaml Scientific and Engineering Computing @ http://ocaml.xyz
Stars: ✭ 919 (+8254.55%)
KyuubiKyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (+3200%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+7109.09%)
Mongo SparkThe MongoDB Spark Connector
Stars: ✭ 588 (+5245.45%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+3181.82%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+3190.91%)
PyopenclOpenCL integration for Python, plus shiny features
Stars: ✭ 790 (+7081.82%)
Pygam[HELP REQUESTED] Generalized Additive Models in Python
Stars: ✭ 569 (+5072.73%)
SparklearningLearning Apache spark,including code and data .Most part can run local.
Stars: ✭ 558 (+4972.73%)