BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+4726.58%)
LabsResearch on distributed system
Stars: ✭ 73 (-7.59%)
NeuraxleA Sklearn-like Framework for Hyperparameter Tuning and AutoML in Deep Learning projects. Finally have the right abstractions and design patterns to properly do AutoML. Let your pipeline steps have hyperparameter spaces. Enable checkpoints to cut duplicate calculations. Go from research to production environment easily.
Stars: ✭ 377 (+377.22%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+6878.48%)
MlboxMLBox is a powerful Automated Machine Learning python library.
Stars: ✭ 1,199 (+1417.72%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (+541.77%)
Knowledge RepoA next-generation curated knowledge sharing platform for data scientists and other technical professions.
Stars: ✭ 4,956 (+6173.42%)
Awesome RA curated list of awesome R packages, frameworks and software.
Stars: ✭ 4,858 (+6049.37%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (-8.86%)
NipypeWorkflows and interfaces for neuroimaging packages
Stars: ✭ 557 (+605.06%)
Data ScienceCollection of useful data science topics along with code and articles
Stars: ✭ 315 (+298.73%)
PdpipeEasy pipelines for pandas DataFrames.
Stars: ✭ 590 (+646.84%)
Imbalanced LearnA Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
Stars: ✭ 5,617 (+7010.13%)
AlluxioAlluxio, data orchestration for analytics and machine learning in the cloud
Stars: ✭ 5,379 (+6708.86%)
Hyperlearn50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster
Stars: ✭ 1,204 (+1424.05%)
ElkiELKI Data Mining Toolkit
Stars: ✭ 613 (+675.95%)
Tsrepr TSrepr: R package for time series representations
Stars: ✭ 75 (-5.06%)
NfstreamNFStream: a Flexible Network Data Analysis Framework.
Stars: ✭ 622 (+687.34%)
Go StreamsA lightweight stream processing library for Go
Stars: ✭ 615 (+678.48%)
DataprooferA proofreader for your data
Stars: ✭ 628 (+694.94%)
Data Science CareerCareer Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (+697.47%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+843.04%)
Getting StartedThis repository is a getting started guide to Singer.
Stars: ✭ 734 (+829.11%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+903.8%)
PrefectThe easiest way to automate your data
Stars: ✭ 7,956 (+9970.89%)
Osint collectionMaintained collection of OSINT related resources. (All Free & Actionable)
Stars: ✭ 809 (+924.05%)
DataframeC++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved
Stars: ✭ 828 (+948.1%)
SkdataPython tools for data analysis
Stars: ✭ 16 (-79.75%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+370.89%)
Cookbook 2ndIPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018
Stars: ✭ 704 (+791.14%)
Datastream.ioAn open-source framework for real-time anomaly detection using Python, ElasticSearch and Kibana
Stars: ✭ 814 (+930.38%)
PretzelJavascript full-stack framework for Big Data visualisation and analysis
Stars: ✭ 26 (-67.09%)
Tiledb VcfEfficient variant-call data storage and retrieval library using the TileDB storage library.
Stars: ✭ 26 (-67.09%)
CookbookThe Data Engineering Cookbook
Stars: ✭ 9,829 (+12341.77%)
ResourcesPyMC3 educational resources
Stars: ✭ 930 (+1077.22%)
SocratA Dynamic Web Toolbox for Interactive Data Processing, Analysis, and Visualization
Stars: ✭ 26 (-67.09%)
AutodlAutomated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (+981.01%)
Pandas ProfilingCreate HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+10443.04%)
SparkjniA heterogeneous Apache Spark framework.
Stars: ✭ 11 (-86.08%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+1412.66%)
MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+1075.95%)
Steppy ToolkitCurated set of transformers that make your work with steppy faster and more effective 🔭
Stars: ✭ 21 (-73.42%)
TedsdsApache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark
Stars: ✭ 14 (-82.28%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+39922.78%)
Janitorsimple tools for data cleaning in R
Stars: ✭ 981 (+1141.77%)
DataconfsA list of conferences connected with data worldwide.
Stars: ✭ 36 (-54.43%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-12.66%)
Mlcourse.aiOpen Machine Learning Course
Stars: ✭ 7,963 (+9979.75%)
Mlj.jlA Julia machine learning framework
Stars: ✭ 982 (+1143.04%)
PixiedustPython Helper library for Jupyter Notebooks
Stars: ✭ 998 (+1163.29%)
AttacaRobust, distributed version control for large files.
Stars: ✭ 41 (-48.1%)
Ether sqlA python library to push ethereum blockchain data into an sql database.
Stars: ✭ 41 (-48.1%)