All Projects → check-engine → Similar Projects or Alternatives

497 Open source projects that are alternatives of or similar to check-engine

Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+644.83%)
Mutual labels:  big-data, pyspark
mmtf-workshop-2018
Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (+72.41%)
Mutual labels:  big-data, pyspark
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+4513.79%)
Mutual labels:  big-data, pyspark
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+9896.55%)
Mutual labels:  big-data, pyspark
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+282.76%)
Mutual labels:  big-data, pyspark
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+417.24%)
Mutual labels:  big-data, pyspark
Pyspark Setup Demo
Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-17.24%)
Mutual labels:  big-data, pyspark
pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (+296.55%)
Mutual labels:  big-data, pyspark
Bitcoin Value Predictor
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (+213.79%)
Mutual labels:  big-data, pyspark
big data
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (+17.24%)
Mutual labels:  big-data, pyspark
pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+148.28%)
Mutual labels:  big-data, pyspark
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+34.48%)
Mutual labels:  big-data, pyspark
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (+100%)
Mutual labels:  pyspark, data-quality
SynapseML
Simple and Distributed Machine Learning
Stars: ✭ 3,355 (+11468.97%)
Mutual labels:  big-data, pyspark
siembol
An open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.
Stars: ✭ 153 (+427.59%)
Mutual labels:  big-data
bigquery-kafka-connect
☁️ nodejs kafka connect connector for Google BigQuery
Stars: ✭ 17 (-41.38%)
Mutual labels:  big-data
airavata-php-gateway
Mirror of Apache Airavata PHP Gateway
Stars: ✭ 15 (-48.28%)
Mutual labels:  big-data
azure-big-data-starter
A boilerplate project for Azure Big Data PaaS services
Stars: ✭ 13 (-55.17%)
Mutual labels:  big-data
MLBD
Materials for "Machine Learning on Big Data" course
Stars: ✭ 20 (-31.03%)
Mutual labels:  big-data
phrase-at-scale
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Stars: ✭ 115 (+296.55%)
Mutual labels:  pyspark
beam-site
Apache Beam Site
Stars: ✭ 28 (-3.45%)
Mutual labels:  big-data
ceja
PySpark phonetic and string matching algorithms
Stars: ✭ 24 (-17.24%)
Mutual labels:  pyspark
LoL-Match-Prediction
Win probability predictions for League of Legends matches using neural networks
Stars: ✭ 34 (+17.24%)
Mutual labels:  big-data
pyspark-ML-in-Colab
Pyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (+10.34%)
Mutual labels:  pyspark
IoT-system-PLC-data-to-InfluxDB
This project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.
Stars: ✭ 26 (-10.34%)
Mutual labels:  big-data
versatile-data-kit
Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+396.55%)
Mutual labels:  data-quality
Big-Data-Demo
基于Vue、three.js、echarts,数据可视化展示项目,包含三维模型导入交互、三维模型标注等功能
Stars: ✭ 146 (+403.45%)
Mutual labels:  big-data
leila
Librería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co
Stars: ✭ 56 (+93.1%)
Mutual labels:  data-quality
pyspark-for-data-processing
Code for my presentation: Using PySpark to Process Boat Loads of Data
Stars: ✭ 20 (-31.03%)
Mutual labels:  pyspark
databricks-notebooks
Collection of Databricks and Jupyter Notebooks
Stars: ✭ 19 (-34.48%)
Mutual labels:  pyspark
xcast
A High-Performance Data Science Toolkit for the Earth Sciences
Stars: ✭ 28 (-3.45%)
Mutual labels:  big-data
rastercube
rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-48.28%)
Mutual labels:  big-data
OnlineStatsBase.jl
Base types for OnlineStats.
Stars: ✭ 26 (-10.34%)
Mutual labels:  big-data
python mozetl
ETL jobs for Firefox Telemetry
Stars: ✭ 25 (-13.79%)
Mutual labels:  pyspark
jupyterlab-sparkmonitor
JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Stars: ✭ 78 (+168.97%)
Mutual labels:  pyspark
CS Book
🔥 Latest computer science e-books。提供最新技术类电子书下载, “我无非就是想卷死各位,或者被各位卷死!”
Stars: ✭ 40 (+37.93%)
Mutual labels:  big-data
osm-data-classification
Migrated to: https://gitlab.com/Oslandia/osm-data-classification
Stars: ✭ 23 (-20.69%)
Mutual labels:  data-quality
spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (+131.03%)
Mutual labels:  big-data
arrow-datafusion
Apache Arrow DataFusion SQL Query Engine
Stars: ✭ 2,360 (+8037.93%)
Mutual labels:  big-data
scarf
Toolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.
Stars: ✭ 54 (+86.21%)
Mutual labels:  big-data
ByteSlice
"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)
Stars: ✭ 24 (-17.24%)
Mutual labels:  big-data
RemoteShuffleService
Celeborn provides an elastic and high-performance service for shuffle and spilled data.
Stars: ✭ 262 (+803.45%)
Mutual labels:  big-data
pyspark-k8s-boilerplate
Boilerplate for PySpark on Cloud Kubernetes
Stars: ✭ 24 (-17.24%)
Mutual labels:  pyspark
terraform-aws-kinesis-firehose
This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.
Stars: ✭ 25 (-13.79%)
Mutual labels:  big-data
classifai
🔥 One of the most comprehensive open-source data annotation platform.
Stars: ✭ 99 (+241.38%)
Mutual labels:  big-data
Sparkora
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (+75.86%)
Mutual labels:  pyspark
SGDLibrary
MATLAB/Octave library for stochastic optimization algorithms: Version 1.0.20
Stars: ✭ 165 (+468.97%)
Mutual labels:  big-data
spark-root
Apache Spark Data Source for ROOT File Format
Stars: ✭ 28 (-3.45%)
Mutual labels:  big-data
dxram
A distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (-13.79%)
Mutual labels:  big-data
Movies-Analytics-in-Spark-and-Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Stars: ✭ 47 (+62.07%)
Mutual labels:  big-data
insightedge
InsightEdge Core
Stars: ✭ 22 (-24.14%)
Mutual labels:  big-data
nebula
A distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+28162.07%)
Mutual labels:  big-data
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Stars: ✭ 1,173 (+3944.83%)
Mutual labels:  big-data
cloudberry
Big Data Visualization
Stars: ✭ 89 (+206.9%)
Mutual labels:  big-data
Real Time Social Media Mining
DevOps pipeline for Real Time Social/Web Mining
Stars: ✭ 22 (-24.14%)
Mutual labels:  big-data
GDLibrary
Matlab library for gradient descent algorithms: Version 1.0.1
Stars: ✭ 50 (+72.41%)
Mutual labels:  big-data
storm-ml
an online learning algorithm library for Storm
Stars: ✭ 18 (-37.93%)
Mutual labels:  big-data
IATI.cloud
The open-source IATI datastore for IATI data with RESTful web API providing XML, JSON, CSV output. It extracts and parses IATI XML files referenced in the IATI Registry and powered by Apache Solr.
Stars: ✭ 35 (+20.69%)
Mutual labels:  data-validation
incubator-liminal
Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
Stars: ✭ 117 (+303.45%)
Mutual labels:  big-data
objectiv-analytics
Powerful product analytics for data teams, with full control over data & models.
Stars: ✭ 399 (+1275.86%)
Mutual labels:  data-validation
1-60 of 497 similar projects