GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+644.83%)
mmtf-workshop-2018Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (+72.41%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+4513.79%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+9896.55%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+282.76%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+417.24%)
Pyspark Setup DemoDemo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-17.24%)
pyspark-cheatsheetPySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (+296.55%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (+213.79%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (+17.24%)
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+148.28%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+34.48%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (+100%)
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (+11468.97%)
siembolAn open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.
Stars: ✭ 153 (+427.59%)
MLBDMaterials for "Machine Learning on Big Data" course
Stars: ✭ 20 (-31.03%)
phrase-at-scaleDetect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Stars: ✭ 115 (+296.55%)
beam-siteApache Beam Site
Stars: ✭ 28 (-3.45%)
cejaPySpark phonetic and string matching algorithms
Stars: ✭ 24 (-17.24%)
LoL-Match-PredictionWin probability predictions for League of Legends matches using neural networks
Stars: ✭ 34 (+17.24%)
pyspark-ML-in-ColabPyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (+10.34%)
IoT-system-PLC-data-to-InfluxDBThis project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.
Stars: ✭ 26 (-10.34%)
versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+396.55%)
Big-Data-Demo基于Vue、three.js、echarts,数据可视化展示项目,包含三维模型导入交互、三维模型标注等功能
Stars: ✭ 146 (+403.45%)
leilaLibrería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co
Stars: ✭ 56 (+93.1%)
xcastA High-Performance Data Science Toolkit for the Earth Sciences
Stars: ✭ 28 (-3.45%)
rastercuberastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-48.28%)
python mozetlETL jobs for Firefox Telemetry
Stars: ✭ 25 (-13.79%)
jupyterlab-sparkmonitorJupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Stars: ✭ 78 (+168.97%)
CS Book🔥 Latest computer science e-books。提供最新技术类电子书下载, “我无非就是想卷死各位,或者被各位卷死!”
Stars: ✭ 40 (+37.93%)
spark-recordsBulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (+131.03%)
arrow-datafusionApache Arrow DataFusion SQL Query Engine
Stars: ✭ 2,360 (+8037.93%)
scarfToolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.
Stars: ✭ 54 (+86.21%)
ByteSlice"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)
Stars: ✭ 24 (-17.24%)
RemoteShuffleServiceCeleborn provides an elastic and high-performance service for shuffle and spilled data.
Stars: ✭ 262 (+803.45%)
classifai🔥 One of the most comprehensive open-source data annotation platform.
Stars: ✭ 99 (+241.38%)
SparkoraPowerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (+75.86%)
SGDLibraryMATLAB/Octave library for stochastic optimization algorithms: Version 1.0.20
Stars: ✭ 165 (+468.97%)
spark-rootApache Spark Data Source for ROOT File Format
Stars: ✭ 28 (-3.45%)
dxramA distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (-13.79%)
nebulaA distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+28162.07%)
img2datasetEasily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Stars: ✭ 1,173 (+3944.83%)
cloudberryBig Data Visualization
Stars: ✭ 89 (+206.9%)
GDLibraryMatlab library for gradient descent algorithms: Version 1.0.1
Stars: ✭ 50 (+72.41%)
storm-mlan online learning algorithm library for Storm
Stars: ✭ 18 (-37.93%)
IATI.cloudThe open-source IATI datastore for IATI data with RESTful web API providing XML, JSON, CSV output. It extracts and parses IATI XML files referenced in the IATI Registry and powered by Apache Solr.
Stars: ✭ 35 (+20.69%)
incubator-liminalApache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
Stars: ✭ 117 (+303.45%)
objectiv-analyticsPowerful product analytics for data teams, with full control over data & models.
Stars: ✭ 399 (+1275.86%)