All Projects → Bigtop → Similar Projects or Alternatives

369 Open source projects that are alternatives of or similar to Bigtop

A tool for provisioning and managing Apache Hadoop clusters in the cloud. Cloudbreak, as part of the Hortonworks Data Platform, makes it easy to provision, configure and elastically grow HDP clusters on cloud infrastructure. Cloudbreak can be used to provision Hadoop across cloud infrastructure providers including AWS, Azure, GCP and OpenStack.

Stars: ✭ 301 (-15.45%)

Mutual labels: big-data

check-engine

Data validation library for PySpark 3.0.0

Stars: ✭ 29 (-91.85%)

Mutual labels: big-data

pipeline

OONI data processing pipeline

Stars: ✭ 36 (-89.89%)

Mutual labels: big-data

classifai

🔥 One of the most comprehensive open-source data annotation platform.

Stars: ✭ 99 (-72.19%)

Mutual labels: big-data

Beeva Best Practices

Best Practices and Style Guides in BEEVA

Stars: ✭ 335 (-5.9%)

Mutual labels: big-data

storm-ml

an online learning algorithm library for Storm

Stars: ✭ 18 (-94.94%)

Mutual labels: big-data

NiFi-Rule-engine-processor

Drools processor for Apache NiFi

Stars: ✭ 34 (-90.45%)

Mutual labels: big-data

OnlineStatsBase.jl

Base types for OnlineStats.

Stars: ✭ 26 (-92.7%)

Mutual labels: big-data

Baize

白泽自动化运维系统：配置管理、网络探测、资产管理、业务管理、CMDB、CD、DevOps、作业编排、任务编排等功能,未来将添加监控、报警、日志分析、大数据分析等部分内容

Stars: ✭ 296 (-16.85%)

Mutual labels: big-data

ByteSlice

"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)

Stars: ✭ 24 (-93.26%)

Mutual labels: big-data

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-96.35%)

Mutual labels: big-data

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (-86.8%)

Mutual labels: big-data

Stroom

Stroom is a highly scalable data storage, processing and analysis platform.

Stars: ✭ 344 (-3.37%)

Mutual labels: big-data

meetups-archivos

Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …

Stars: ✭ 60 (-83.15%)

Mutual labels: big-data

lens

Mirror of Apache Lens

Stars: ✭ 57 (-83.99%)

Mutual labels: big-data

bigquery-kafka-connect

☁️ nodejs kafka connect connector for Google BigQuery

Stars: ✭ 17 (-95.22%)

Mutual labels: big-data

Crate

CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of data in real-time.

Stars: ✭ 3,254 (+814.04%)

Mutual labels: big-data

LoL-Match-Prediction

Win probability predictions for League of Legends matches using neural networks

Stars: ✭ 34 (-90.45%)

Mutual labels: big-data

AverageShiftedHistograms.jl

⚡ Lightning fast density estimation in Julia ⚡

Stars: ✭ 52 (-85.39%)

Mutual labels: big-data

insightedge

InsightEdge Core

Stars: ✭ 22 (-93.82%)

Mutual labels: big-data

Uproot3

ROOT I/O in pure Python and NumPy.

Stars: ✭ 312 (-12.36%)

Mutual labels: big-data

incubator-liminal

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

Stars: ✭ 117 (-67.13%)

Mutual labels: big-data

hadoop-data-ingestion-tool

OLAP and ETL of Big Data

Stars: ✭ 17 (-95.22%)

Mutual labels: big-data

beekeeper

Service for automatically managing and cleaning up unreferenced data

Stars: ✭ 43 (-87.92%)

Mutual labels: big-data

Oie Resources

A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.

Stars: ✭ 283 (-20.51%)

Mutual labels: big-data

siembol

An open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.

Stars: ✭ 153 (-57.02%)

Mutual labels: big-data

alluxio-py

Alluxio Python client - Access Any Data Source with Python

Stars: ✭ 18 (-94.94%)

Mutual labels: big-data

airavata-php-gateway

Mirror of Apache Airavata PHP Gateway

Stars: ✭ 15 (-95.79%)

Mutual labels: big-data

Devops Roadmap

DevOps methodology & roadmap for a devops developer in 2019. Interesting books to learn new technologies.

Stars: ✭ 349 (-1.97%)

Mutual labels: big-data

CS Book

🔥 Latest computer science e-books。提供最新技术类电子书下载， “我无非就是想卷死各位，或者被各位卷死！”

Stars: ✭ 40 (-88.76%)

Mutual labels: big-data

predictionio-template-similar-product

PredictionIO Similar Product Engine Template (Scala-based parallelized engine)

Stars: ✭ 50 (-85.96%)

Mutual labels: big-data

spark-records

Bulletproof Apache Spark jobs with fast root cause analysis of failures.

Stars: ✭ 67 (-81.18%)

Mutual labels: big-data

Knowage Server

Knowage is the professional open source suite for modern business analytics over traditional sources and big data systems.

Stars: ✭ 276 (-22.47%)

Mutual labels: big-data

RemoteShuffleService

Celeborn provides an elastic and high-performance service for shuffle and spilled data.

Stars: ✭ 262 (-26.4%)

Mutual labels: big-data

pypar

Efficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.

Stars: ✭ 66 (-81.46%)

Mutual labels: big-data

terraform-aws-kinesis-firehose

This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.

Stars: ✭ 25 (-92.98%)

Mutual labels: big-data

Mist

Serverless proxy for Spark cluster

Stars: ✭ 309 (-13.2%)

Mutual labels: big-data

dxram

A distributed in-memory key-value storage for billions of small objects.

Stars: ✭ 25 (-92.98%)

Mutual labels: big-data

pytorch kmeans

Implementation of the k-means algorithm in PyTorch that works for large datasets

Stars: ✭ 38 (-89.33%)

Mutual labels: big-data

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Stars: ✭ 1,173 (+229.49%)

Mutual labels: big-data

Attic Predictionio Sdk Php

PredictionIO PHP SDK

Stars: ✭ 272 (-23.6%)

Mutual labels: big-data

GDLibrary

Matlab library for gradient descent algorithms: Version 1.0.1

Stars: ✭ 50 (-85.96%)

Mutual labels: big-data

pyspark-cheatsheet

PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster

Stars: ✭ 115 (-67.7%)

Mutual labels: big-data

lcbo-api

A crawler and API server for Liquor Control Board of Ontario retail data

Stars: ✭ 152 (-57.3%)

Mutual labels: big-data

Ozone

Scalable, redundant, and distributed object store for Apache Hadoop

Stars: ✭ 330 (-7.3%)

Mutual labels: big-data

gan deeplearning4j

Automatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.

Stars: ✭ 19 (-94.66%)

Mutual labels: big-data

hyper-engine

Python library for Bayesian hyper-parameters optimization

Stars: ✭ 80 (-77.53%)

Mutual labels: big-data

FlameStream

Distributed stream processing model and its implementation

Stars: ✭ 14 (-96.07%)

Mutual labels: big-data

Succinct

Enabling queries on compressed data.

Stars: ✭ 257 (-27.81%)

Mutual labels: big-data

ngm

swissgeol.ch gives you insight in geoscientific data - above and below the surface.

Stars: ✭ 23 (-93.54%)

Mutual labels: big-data

big-data-lite

Samples to the Oracle Big Data Lite VM

Stars: ✭ 41 (-88.48%)

Mutual labels: big-data

automile-net

Automile offers a simple, smart, cutting-edge telematics solution for businesses to track and manage their business vehicles.

Stars: ✭ 24 (-93.26%)

Mutual labels: big-data

Helix

Mirror of Apache Helix

Stars: ✭ 304 (-14.61%)

Mutual labels: big-data

falcon

Mirror of Apache Falcon

Stars: ✭ 95 (-73.31%)

Mutual labels: big-data

Vespa

The open big data serving engine. https://vespa.ai

Stars: ✭ 3,747 (+952.53%)

Mutual labels: big-data

Attic Apex Core

Mirror of Apache Apex core

Stars: ✭ 346 (-2.81%)

Mutual labels: big-data

Grouparoo

🦘 The Grouparoo Monorepo - open source customer data sync framework

Stars: ✭ 334 (-6.18%)

Mutual labels: big-data

Morpheus

Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.

Stars: ✭ 303 (-14.89%)

Mutual labels: big-data

mmtf-workshop-2018

Structural Bioinformatics Training Workshop & Hackathon 2018

Stars: ✭ 50 (-85.96%)

Mutual labels: big-data

couchdb-mango

Mirror of Apache CouchDB Mango

Stars: ✭ 34 (-90.45%)

Mutual labels: big-data

61-120 of 369 similar projects

‹

›

next*5