taller SparkRTaller SparkR para las Jornadas de Usuarios de R
Stars: ✭ 12 (-90.77%)
DeepgraphAnalyze Data with Pandas-based Networks. Documentation:
Stars: ✭ 232 (+78.46%)
LagoujobJob data mining repo for lagou.com
Stars: ✭ 256 (+96.92%)
UrsUniversal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.
Stars: ✭ 275 (+111.54%)
Sourced Cesource{d} Community Edition (CE)
Stars: ✭ 153 (+17.69%)
genieclustGenie++ Fast and Robust Hierarchical Clustering with Noise Point Detection - for Python and R
Stars: ✭ 34 (-73.85%)
DataprooferA proofreader for your data
Stars: ✭ 628 (+383.08%)
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+556.92%)
Rightmove webscraper.pyPython class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object
Stars: ✭ 125 (-3.85%)
Data Science Resources👨🏽🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
Stars: ✭ 171 (+31.54%)
NmflibraryMATLAB library for non-negative matrix factorization (NMF): Version 1.8.1
Stars: ✭ 153 (+17.69%)
genieGenie: A Fast and Robust Hierarchical Clustering Algorithm (this R package has now been superseded by genieclust)
Stars: ✭ 21 (-83.85%)
DatascienceCurated list of Python resources for data science.
Stars: ✭ 3,051 (+2246.92%)
ElkiELKI Data Mining Toolkit
Stars: ✭ 613 (+371.54%)
Liteflowliteflow是一个基于任务版本来实现的分布式任务流调度系统
Stars: ✭ 112 (-13.85%)
Cookbook 2ndIPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018
Stars: ✭ 704 (+441.54%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-46.92%)
Tsrepr TSrepr: R package for time series representations
Stars: ✭ 75 (-42.31%)
FlyteAccelerate your ML and Data workflows to production. Flyte is a production grade orchestration system for your Data and ML workloads. It has been battle tested at Lyft, Spotify, freenome and others and truly open-source.
Stars: ✭ 1,242 (+855.38%)
DexDex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.
Stars: ✭ 1,238 (+852.31%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+658.46%)
PycmMulti-class confusion matrix library in Python
Stars: ✭ 1,076 (+727.69%)
Pipelinethe `pipeline` shell command
Stars: ✭ 168 (+29.23%)
Amazing Feature EngineeringFeature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (+67.69%)
Pydataroadopen source for wechat-official-account (ID: PyDataLab)
Stars: ✭ 302 (+132.31%)
python-notebooksA collection of Jupyter Notebooks used in conferences or just to have some snippets.
Stars: ✭ 14 (-89.23%)
heidiheidi : tidy data in Haskell
Stars: ✭ 24 (-81.54%)
PracticalMachineLearningA collection of ML related stuff including notebooks, codes and a curated list of various useful resources such as books and softwares. Almost everything mentioned here is free (as speech not free food) or open-source.
Stars: ✭ 60 (-53.85%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-17.69%)
twitter-analytics-wrapperA simple Python wrapper to download tweets data from the Twitter Analytics platform. Particularly interesting for the impressions metrics that are unavailable on current Twitter API. Also works for the videos data.
Stars: ✭ 44 (-66.15%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-90%)
ArvadosAn open source platform for managing and analyzing biomedical big data
Stars: ✭ 274 (+110.77%)
VectorbtUltimate Python library for time series analysis and backtesting at scale
Stars: ✭ 855 (+557.69%)
Cookbook 2nd CodeCode of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]
Stars: ✭ 541 (+316.15%)
PyodA Python Toolbox for Scalable Outlier Detection (Anomaly Detection)
Stars: ✭ 5,083 (+3810%)
NfstreamNFStream: a Flexible Network Data Analysis Framework.
Stars: ✭ 622 (+378.46%)
Ai Learn人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
Stars: ✭ 4,387 (+3274.62%)
Model Describermodel-describer : Making machine learning interpretable to humans
Stars: ✭ 22 (-83.08%)
Knowage ServerKnowage is the professional open source suite for modern business analytics over traditional sources and big data systems.
Stars: ✭ 276 (+112.31%)
Drugs Recommendation Using ReviewsAnalyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.
Stars: ✭ 35 (-73.08%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+929.23%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-16.15%)
DeeptimeDeep learning meets molecular dynamics.
Stars: ✭ 123 (-5.38%)
TokencapsA middleware framework and persistence layer to aggregate and normalize crypto-currency data.
Stars: ✭ 118 (-9.23%)
Kddcup 20206th Solution for 2020-KDDCUP Debiasing Challenge
Stars: ✭ 118 (-9.23%)
VbmcVariational Bayesian Monte Carlo (VBMC) algorithm for posterior and model inference in MATLAB
Stars: ✭ 123 (-5.38%)
Chain.jlA Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.
Stars: ✭ 118 (-9.23%)
ConcordConcord - workflow orchestration and continuous deployment management
Stars: ✭ 117 (-10%)