DatscanDatScan is an initiative to build an open-source CMS that will have the capability to solve any problem using data Analysis just with the help of various modules and a vast standardized module library
Stars: ✭ 13 (-90.44%)
ZatZeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Stars: ✭ 303 (+122.79%)
Ai Learn人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
Stars: ✭ 4,387 (+3125.74%)
Seaborn TutorialThis repository is my attempt to help Data Science aspirants gain necessary Data Visualization skills required to progress in their career. It includes all the types of plot offered by Seaborn, applied on random datasets.
Stars: ✭ 114 (-16.18%)
Mlcourse.aiOpen Machine Learning Course
Stars: ✭ 7,963 (+5755.15%)
Pyda 2e Zh📖 [译] 利用 Python 进行数据分析 · 第 2 版
Stars: ✭ 866 (+536.76%)
Data Analysis主要是爬虫与数据分析项目总结,外加建模与机器学习,模型的评估。
Stars: ✭ 142 (+4.41%)
100 Pandas Puzzles100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)
Stars: ✭ 1,382 (+916.18%)
Data Science HacksData Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (+100.74%)
Awkward 1.0Manipulate JSON-like data with NumPy-like idioms.
Stars: ✭ 203 (+49.26%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+16111.76%)
TypologySwift type checking and semantic analysis for developer tools
Stars: ✭ 68 (-50%)
kobe-every-shot-everA Los Angeles Times analysis of Every shot in Kobe Bryant's NBA career
Stars: ✭ 66 (-51.47%)
PracticalMachineLearningA collection of ML related stuff including notebooks, codes and a curated list of various useful resources such as books and softwares. Almost everything mentioned here is free (as speech not free food) or open-source.
Stars: ✭ 60 (-55.88%)
covid-19Data ETL & Analysis on the global and Mexican datasets of the COVID-19 pandemic.
Stars: ✭ 14 (-89.71%)
whyqddata wrangling simplicity, complete audit transparency, and at speed
Stars: ✭ 16 (-88.24%)
tempoAPI for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
Stars: ✭ 212 (+55.88%)
Dominando-PandasEste repositório está destinado ao processo de aprendizagem da biblioteca Pandas.
Stars: ✭ 22 (-83.82%)
Machine-LearningThis repository contains notebooks that will help you in understanding basic ML algorithms as well as basic numpy excercise. 💥 🌈 🌈
Stars: ✭ 15 (-88.97%)
datascienvdatascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries
Stars: ✭ 53 (-61.03%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+2138.24%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (+16.18%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-90.44%)
DatacompyPandas and Spark DataFrame comparison for humans
Stars: ✭ 147 (+8.09%)
Data-Wrangling-with-PythonSimplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Stars: ✭ 90 (-33.82%)
Cape PythonCollaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark
Stars: ✭ 125 (-8.09%)
EngeznyEngezny is a python package that quickly generates all possible charts from your dataframe and saves them for you, and engezny is only supporting now uni-parameter visualization using the pie, bar and barh visualizations.
Stars: ✭ 25 (-81.62%)
Data-Science-101Notes and tutorials on how to use python, pandas, seaborn, numpy, matplotlib, scipy for data science.
Stars: ✭ 19 (-86.03%)
saddleSADDLE: Scala Data Library
Stars: ✭ 23 (-83.09%)
CC33ZCurso de Ciência da Computação
Stars: ✭ 50 (-63.24%)
pandas-workshopAn introductory workshop on pandas with notebooks and exercises for following along.
Stars: ✭ 161 (+18.38%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+350%)
DataSciPyData Science with Python
Stars: ✭ 15 (-88.97%)
Python BigdataData science and Big Data with Python
Stars: ✭ 112 (-17.65%)
tutorialsShort programming tutorials pertaining to data analysis.
Stars: ✭ 14 (-89.71%)
ml-workflow-automationPython Machine Learning (ML) project that demonstrates the archetypal ML workflow within a Jupyter notebook, with automated model deployment as a RESTful service on Kubernetes.
Stars: ✭ 44 (-67.65%)
online-course-recommendation-systemBuilt on data from Pluralsight's course API fetched results. Works with model trained with K-means unsupervised clustering algorithm.
Stars: ✭ 31 (-77.21%)
datasets🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Stars: ✭ 13,870 (+10098.53%)
datatileA library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (+208.09%)
Information-RetrievalInformation Retrieval algorithms developed in python. To follow the blog posts, click on the link:
Stars: ✭ 103 (-24.26%)
valinvestA value investing tool based on Warren Buffett, Joseph Piotroski and Benjamin Graham thoughts
Stars: ✭ 84 (-38.24%)
spyndexAwesome Spectral Indices in Python.
Stars: ✭ 56 (-58.82%)
DataProfilerWhat's in your data? Extract schema, statistics and entities from datasets
Stars: ✭ 843 (+519.85%)
vinumVinum is a SQL processor for Python, designed for data analysis workflows and in-memory analytics.
Stars: ✭ 57 (-58.09%)
Python-campNo description or website provided.
Stars: ✭ 34 (-75%)
Product-Categorization-NLPMulti-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).
Stars: ✭ 30 (-77.94%)
Data-Scientist-In-PythonThis repository contains notes and projects of Data scientist track from dataquest course work.
Stars: ✭ 23 (-83.09%)