SmileStatistical Machine Intelligence & Learning Engine
Stars: ✭ 5,412 (+921.13%)
graphgroveA framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search
Stars: ✭ 29 (-94.53%)
ScannsA scalable nearest neighbor search library in Apache Spark
Stars: ✭ 190 (-64.15%)
ElasticlusterCreate clusters of VMs on the cloud and configure them with Ansible.
Stars: ✭ 298 (-43.77%)
YanagishimaWeb UI for Trino, Presto, Hive, Elasticsearch, SparkSQL
Stars: ✭ 424 (-20%)
RedashMake Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Stars: ✭ 20,147 (+3701.32%)
DogvscatSample Docker Swarm cluster stack of tools
Stars: ✭ 377 (-28.87%)
Lidar for ad referencesA list of references on lidar point cloud processing for autonomous driving
Stars: ✭ 456 (-13.96%)
FeatranA Scala feature transformation library for data science and machine learning
Stars: ✭ 420 (-20.75%)
SparkmeasureThis is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
Stars: ✭ 368 (-30.57%)
TutorialJava全栈知识架构体系总结
Stars: ✭ 407 (-23.21%)
God Of Bigdata专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+1033.58%)
CdpCode for our ECCV 2018 work.
Stars: ✭ 391 (-26.23%)
HazelcastOpen-source distributed computation and storage platform
Stars: ✭ 4,662 (+779.62%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+619.43%)
Dji Firmware ToolsTools for handling firmwares of DJI products, with focus on quadcopters.
Stars: ✭ 424 (-20%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (-29.81%)
Tensorflow BookAccompanying source code for Machine Learning with TensorFlow. Refer to the book for step-by-step explanations.
Stars: ✭ 4,448 (+739.25%)
SidekickHigh Performance HTTP Sidecar Load Balancer
Stars: ✭ 366 (-30.94%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-31.89%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (-31.7%)
Akkatecturea cqrs and event sourcing framework for dotnet core using akka.net
Stars: ✭ 414 (-21.89%)
OapOptimized Analytics Package for Spark* Platform
Stars: ✭ 343 (-35.28%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (-23.4%)
Bigdataie大数据博客、笔试题、教程、项目、面经的整理
Stars: ✭ 445 (-16.04%)
MoaMOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.
Stars: ✭ 409 (-22.83%)
SparkCross-platform real-time collaboration client optimized for business and organizations.
Stars: ✭ 471 (-11.13%)
IcebergIceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (-25.85%)
Stats Maths With PythonGeneral statistics, mathematical programming, and numerical/scientific computing scripts and notebooks in Python
Stars: ✭ 381 (-28.11%)
DtaidistanceTime series distances: Dynamic Time Warping (DTW)
Stars: ✭ 499 (-5.85%)
Docker practiceLearn and understand Docker technologies, with real DevOps practice!
Stars: ✭ 19,768 (+3629.81%)
ShamanSmall, lightweight, api-driven dns server.
Stars: ✭ 426 (-19.62%)
PynndescentA Python nearest neighbor descent for approximate nearest neighbors
Stars: ✭ 377 (-28.87%)
N2TOROS N2 - lightweight approximate Nearest Neighbor library which runs fast even with large datasets
Stars: ✭ 457 (-13.77%)
TensorflowonsparkTensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Stars: ✭ 3,748 (+607.17%)
MoonboxMoonbox is a DVtaaS (Data Virtualization as a Service) Platform
Stars: ✭ 424 (-20%)
CdapAn open source framework for building data analytic applications.
Stars: ✭ 509 (-3.96%)
Mlpackmlpack: a scalable C++ machine learning library --
Stars: ✭ 3,859 (+628.11%)
LearningsparkScala examples for learning to use Spark
Stars: ✭ 421 (-20.57%)
KyuubiKyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (-31.51%)
Bdp Dataplatform大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Stars: ✭ 456 (-13.96%)
Protoactor GoProto Actor - Ultra fast distributed actors for Go, C# and Java/Kotlin
Stars: ✭ 3,934 (+642.26%)
SparkleHaskell on Apache Spark.
Stars: ✭ 419 (-20.94%)
SparkstreamingSpark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志分析统计;SpringBoot+Echarts实现数据可视化展示
Stars: ✭ 349 (-34.15%)
PointblankData validation and organization of metadata for data frames and database tables
Stars: ✭ 480 (-9.43%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-22.08%)
SparklensQubole Sparklens tool for performance tuning Apache Spark
Stars: ✭ 345 (-34.91%)
ScalnetA Scala wrapper for Deeplearning4j, inspired by Keras. Scala + DL + Spark + GPUs
Stars: ✭ 342 (-35.47%)
IqlAn ad hoc query service based on the spark sql engine.(基于spark sql引擎的即席查询服务)
Stars: ✭ 341 (-35.66%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+4060%)
Enterprise gatewayA lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
Stars: ✭ 412 (-22.26%)
Ytk LearnYtk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Stars: ✭ 337 (-36.42%)
WirbelsturmWirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (-37.36%)
MarmarayGeneric Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (-21.89%)
SpartaReal Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (-3.21%)