All Projects → TrainingByPackt → Data-Wrangling-with-Python

TrainingByPackt / Data-Wrangling-with-Python

Licence: MIT license
Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices

Programming Languages

Jupyter Notebook
11667 projects
HTML
75241 projects
PHP
23972 projects - #3 most used programming language
hack
652 projects

Projects that are alternatives of or similar to Data-Wrangling-with-Python

Udacity-Data-Analyst-Nanodegree
Repository for the projects needed to complete the Data Analyst Nanodegree.
Stars: ✭ 31 (-65.56%)
Mutual labels:  numpy, pandas, data-analytics, data-wrangling
grailer
web scraping tool for grailed.com
Stars: ✭ 30 (-66.67%)
Mutual labels:  pandas, web-scraping, beautifulsoup
Data-Analyst-Nanodegree
Kai Sheng Teh - Udacity Data Analyst Nanodegree
Stars: ✭ 42 (-53.33%)
Mutual labels:  numpy, pandas, data-wrangling
data-analysis-using-python
Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data
Stars: ✭ 81 (-10%)
Mutual labels:  numpy, pandas, data-analytics
Web Database Analytics
Web scrapping and related analytics using Python tools
Stars: ✭ 175 (+94.44%)
Mutual labels:  regular-expression, web-scraping, data-wrangling
The-Data-Visualization-Workshop
A New, Interactive Approach to Learning Data Visualization
Stars: ✭ 59 (-34.44%)
Mutual labels:  numpy, pandas, data-wrangling
Tensorflow Ml Nlp
텐서플로우와 머신러닝으로 시작하는 자연어처리(로지스틱회귀부터 트랜스포머 챗봇까지)
Stars: ✭ 176 (+95.56%)
Mutual labels:  numpy, pandas
Data Science Types
Mypy stubs, i.e., type information, for numpy, pandas and matplotlib
Stars: ✭ 180 (+100%)
Mutual labels:  numpy, pandas
Fashion Recommendation
A clothing retrieval and visual recommendation model for fashion images.
Stars: ✭ 193 (+114.44%)
Mutual labels:  numpy, pandas
Data Science Notebook
📖 每一个伟大的思想和行动都有一个微不足道的开始
Stars: ✭ 196 (+117.78%)
Mutual labels:  numpy, pandas
Data Science Projects With Python
A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn
Stars: ✭ 198 (+120%)
Mutual labels:  numpy, pandas
Awkward 1.0
Manipulate JSON-like data with NumPy-like idioms.
Stars: ✭ 203 (+125.56%)
Mutual labels:  numpy, pandas
Mars
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
Stars: ✭ 2,308 (+2464.44%)
Mutual labels:  numpy, pandas
Ditching Excel For Python
Functionalities in Excel translated to Python
Stars: ✭ 172 (+91.11%)
Mutual labels:  numpy, pandas
Andrew Ng Notes
This is Andrew NG Coursera Handwritten Notes.
Stars: ✭ 180 (+100%)
Mutual labels:  numpy, pandas
Panthera
Data-frames & arrays on Clojure
Stars: ✭ 168 (+86.67%)
Mutual labels:  numpy, pandas
Xarray
N-D labeled arrays and datasets in Python
Stars: ✭ 2,353 (+2514.44%)
Mutual labels:  numpy, pandas
Bootcamp python
Bootcamp to learn Python for Machine Learning
Stars: ✭ 228 (+153.33%)
Mutual labels:  numpy, pandas
Python Wechat Itchat
微信机器人,基于Python itchat接口功能实例展示:01-itchat获取微信好友或者微信群分享文章、02-itchat获取微信公众号文章、03-itchat监听微信公众号发送的文章、04 itchat监听微信群或好友撤回的消息、05 itchat获得微信好友信息以及表图对比、06 python打印出微信被删除好友、07 itchat自动回复好友、08 itchat微信好友个性签名词云图、09 itchat微信好友性别比例、10 微信群或微信好友撤回消息拦截、11 itchat微信群或好友之间转发消息
Stars: ✭ 216 (+140%)
Mutual labels:  numpy, pandas
Jetson Containers
Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
Stars: ✭ 223 (+147.78%)
Mutual labels:  numpy, pandas

GitHub issues GitHub forks GitHub stars PRs Welcome

Data Wrangling with Python by Packt

Data is the new Oil and it is ruling the modern way of life through incredibly smart tools and transformative technologies. But oil does not come out in its final form from the rig. It has to be refined through a complex processing network. Similarly, data needs to be curated, massaged and refined to be used in intelligent algorithms and consumer products. This is called wrangling and (according to Forbes) all the good data scientists spend almost 60-80% of their time on this, each day, every project. It involves scraping the raw data from multiple sources (including web and database tables), imputing, formatting, transforming – basically making it ready, to be used flawlessly in the modeling process. This course aims to teach you all the core ideas behind this process and to equip you with the knowledge of the most popular tools and techniques in the domain. As the programming framework, we have chosen Python, the most widely used language for data science. We work through real-life examples, not toy datasets. At the end of this course, you will be confident to handle a myriad array of sources to extract, clean, transform, and format your data for the great machine learning app you are thinking of building. Hop on and be the part of this exciting journey.

What you will learn

  • Able to manipulate complex and simple data structure using Python and it’s built-in functions
  • Use the fundamental and advanced level of Pandas DataFrames and numpy.array. Manipulate them at run time.
  • Extract and format data from various formats (textual) – normal text file, SQL, CSV, Excel, JSON, and XML
  • Perform web scraping using Python libraries such as BeautifulSoup4 and html5lib
  • Perform advanced string search and manipulation using Python and RegEX
  • Handle outliers, apply advanced programming tricks, and perform data imputation using Pandas
  • Basic descriptive statistics and plotting techniques in Python for quick examination of data
  • Practice data wrangling and modeling using the random data generation techniques - Bonus Topic

Hardware requirements

For an optimal student experience, we recommend the following hardware configuration:

  • OS: Windows 7 SP1 64-bit, Windows 8.1 64-bit or Windows 10 64-bit, Ubuntu Linux, or the latest version of macOS
  • Processor: Intel Core i5 or equivalent
  • Memory: 8GB RAM or more
  • Hard disk: 40GB or more
  • Stable Internet connection

Software requirements

You'll also need the following software installed in advance:

  • Browser: Google Chrome/Mozilla Firefox Latest Version
  • Python 3.4+ (preferably Python 3.6) installed
  • Python libraries as needed (Jupyter, Numpy, Pandas, Matplotlib, BeautifulSoup4, and so)
  • Notepad++/Sublime Text (latest version), Atom IDE (latest version) or other similar text editor applications.

The following Python libraries are needed:

  • NumPy
  • Pandas
  • SciPy
  • scikit-learn
  • Matplotlib
  • BeautifulSoup4
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].