All Projects → bkamins → Julia Dataframes Tutorial

bkamins / Julia Dataframes Tutorial

Licence: mit
A tutorial on Julia DataFrames package

Programming Languages

julia
2034 projects

Projects that are alternatives of or similar to Julia Dataframes Tutorial

Data Science Your Way
Ways of doing Data Science Engineering and Machine Learning in R and Python
Stars: ✭ 530 (+66.67%)
Mutual labels:  jupyter-notebook, tutorial, data-frame
Understanding Nn
Tensorflow tutorial for various Deep Neural Network visualization techniques
Stars: ✭ 261 (-17.92%)
Mutual labels:  jupyter-notebook, tutorial
Pytorch Seq2seq
Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
Stars: ✭ 3,418 (+974.84%)
Mutual labels:  jupyter-notebook, tutorial
Sars tutorial
Repository for the tutorial on Sequence-Aware Recommender Systems held at TheWebConf 2019 and ACM RecSys 2018
Stars: ✭ 320 (+0.63%)
Mutual labels:  jupyter-notebook, tutorial
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+909.12%)
Mutual labels:  jupyter-notebook, tutorial
Deeplearningcoursecodes
Notes, Codes, and Tutorials for the Deep Learning Course <which I taught at ChinaHadoop>
Stars: ✭ 241 (-24.21%)
Mutual labels:  jupyter-notebook, tutorial
Pytorch Image Classification
Tutorials on how to implement a few key architectures for image classification using PyTorch and TorchVision.
Stars: ✭ 272 (-14.47%)
Mutual labels:  jupyter-notebook, tutorial
Functional intro to python
[tutorial]A functional, Data Science focused introduction to Python
Stars: ✭ 228 (-28.3%)
Mutual labels:  jupyter-notebook, tutorial
Generative models tutorial with demo
Generative Models Tutorial with Demo: Bayesian Classifier Sampling, Variational Auto Encoder (VAE), Generative Adversial Networks (GANs), Popular GANs Architectures, Auto-Regressive Models, Important Generative Model Papers, Courses, etc..
Stars: ✭ 276 (-13.21%)
Mutual labels:  jupyter-notebook, tutorial
Cryptocurrency Analysis Python
Open-Source Tutorial For Analyzing and Visualizing Cryptocurrency Data
Stars: ✭ 278 (-12.58%)
Mutual labels:  jupyter-notebook, tutorial
Pytorch Lesson Zh
pytorch 包教不包会
Stars: ✭ 279 (-12.26%)
Mutual labels:  jupyter-notebook, tutorial
Kitti tutorial
Tutorial for using Kitti dataset easily
Stars: ✭ 235 (-26.1%)
Mutual labels:  jupyter-notebook, tutorial
Datavisualization
Tutorials on visualizing data using python packages like bokeh, plotly, seaborn and igraph
Stars: ✭ 234 (-26.42%)
Mutual labels:  jupyter-notebook, tutorial
Dl tutorial
Tutorials for deep learning
Stars: ✭ 247 (-22.33%)
Mutual labels:  jupyter-notebook, tutorial
Neural Network From Scratch
Ever wondered how to code your Neural Network using NumPy, with no frameworks involved?
Stars: ✭ 230 (-27.67%)
Mutual labels:  jupyter-notebook, tutorial
Deep Learning Keras Tensorflow
Introduction to Deep Neural Networks with Keras and Tensorflow
Stars: ✭ 2,868 (+801.89%)
Mutual labels:  jupyter-notebook, tutorial
Scikit Learn Videos
Jupyter notebooks from the scikit-learn video series
Stars: ✭ 3,254 (+923.27%)
Mutual labels:  jupyter-notebook, tutorial
Dl For Chatbot
Deep Learning / NLP tutorial for Chatbot Developers
Stars: ✭ 221 (-30.5%)
Mutual labels:  jupyter-notebook, tutorial
Tutorial
Tutorial covering Open Source tools for Source Separation.
Stars: ✭ 223 (-29.87%)
Mutual labels:  jupyter-notebook, tutorial
Generative Adversarial Networks
Tutorial on GANs
Stars: ✭ 275 (-13.52%)
Mutual labels:  jupyter-notebook, tutorial

An Introduction to DataFrames

Bogumił Kamiński, November 2020, 2020

The tutorial is for DataFrames 0.22.1

A brief introduction to basic usage of DataFrames.

The tutorial contains a specification of the project environment version under which it should be run. In order to prepare this environment, before using the tutorial notebooks, while in the project folder run the following command in the command line:

julia -e 'using Pkg; Pkg.activate("."); Pkg.instantiate()'

Tested under Julia 1.5.3. The project dependencies are the following:

  [69666777] Arrow v1.0.1
  [6e4b80f9] BenchmarkTools v0.5.0
  [336ed68f] CSV v0.8.2
  [324d7699] CategoricalArrays v0.9.0
  [944b1d66] CodecZlib v0.7.0
  [a93c6f00] DataFrames v0.22.1
  [1313f7d8] DataFramesMeta v0.6.0
  [5789e2e9] FileIO v1.4.4
  [da1fdf0e] FreqTables v0.4.2
  [7073ff75] IJulia v1.23.0
  [babc3d20] JDF v0.2.20
  [9da8a3cd] JLSO v2.4.0
  [b9914132] JSONTables v1.0.0
  [86f7a689] NamedArrays v0.9.4
  [b98c9c47] Pipe v1.3.0
  [2dfb63ee] PooledArrays v0.5.3
  [f3b207a7] StatsPlots v0.14.17
  [bd369af6] Tables v1.2.1
  [a5390f91] ZipFile v0.9.3
  [9a3f8284] Random
  [10745b16] Statistics

I will try to keep the material up to date as the packages evolve.

This tutorial covers DataFrames and CategoricalArrays, as they constitute the core of DataFrames along with selected file reading and writing packages.

In the last extras part mentions selected functionalities of selected useful packages that I find useful for data manipulation, currently those are: FreqTables, DataFramesMeta (pending its update to support DataFrames.jl 0.22 release), StatsPlots.

Setting up Jupyter Notebook for work with DataFrames.jl

By default Jupyter Notebook will limit the number of rows and columns when displaying a data frame to roughly fit the screen size (like in the REPL).

You can override this behavior by setting ENV["COLUMNS"] or ENV["LINES"] variables to hold the maximum width and height of output in characters respectively when running a notebook. Alternatively you can add the following entry "COLUMNS": "1000", "LINES": "100" to "env" variable in your Jupyter kernel file. See here for information about location and specification of Jupyter kernels.

TOC

File Topic
01_constructors.ipynb Creating DataFrame and conversion
02_basicinfo.ipynb Getting summary information
03_missingvalues.ipynb Handling missing values
04_loadsave.ipynb Loading and saving DataFrames
05_columns.ipynb Working with columns of DataFrame
06_rows.ipynb Working with row of DataFrame
07_factors.ipynb Working with categorical data
08_joins.ipynb Joining DataFrames
09_reshaping.ipynb Reshaping DataFrames
10_transforms.ipynb Transforming DataFrames
11_performance.ipynb Performance tips
12_pitfalls.ipynb Possible pitfalls
13_extras.ipynb Additional interesting packages

Changelog:

Date Changes
2017-12-05 Initial release
2017-12-06 Added description of insert!, merge!, empty!, categorical!, delete!, DataFrames.index
2017-12-09 Added performance tips
2017-12-10 Added pitfalls
2017-12-18 Added additional worthwhile packages: FreqTables and DataFramesMeta
2017-12-29 Added description of filter and filter!
2017-12-31 Added description of conversion to Matrix
2018-04-06 Added example of extracting a row from a DataFrame
2018-04-21 Major update of whole tutorial
2018-05-01 Added byrow! example
2018-05-13 Added StatPlots package to extras
2018-05-23 Improved comments in sections 1 do 5 by Jane Herriman
2018-07-25 Update to 0.11.7 release
2018-08-25 Update to Julia 1.0 release: sections 1 to 10
2018-08-29 Update to Julia 1.0 release: sections 11, 12 and 13
2018-09-05 Update to Julia 1.0 release: FreqTables section
2018-09-10 Added CSVFiles section to chapter on load/save
2018-09-26 Updated to DataFrames 0.14.0
2018-10-04 Updated to DataFrames 0.14.1, added haskey and repeat
2018-12-08 Updated to DataFrames 0.15.2
2019-01-03 Updated to DataFrames 0.16.0, added serialization instructions
2019-01-18 Updated to DataFrames 0.17.0, added passmissing
2019-01-27 Added Feather.jl file read/write
2019-01-30 Renamed StatPlots.jl to StatsPlots.jl and added Tables.jl
2019-02-08 Added groupvars and groupindices functions
2019-04-27 Updated to DataFrames 0.18.0, dropped JLD2.jl
2019-04-30 Updated handling of missing values description
2019-07-16 Updated to DataFrames 0.19.0
2019-08-14 Added JSONTables.jl and Tables.columnindex
2019-08-16 Added Project.toml and Manifest.toml
2019-08-26 Update to Julia 1.2 and DataFrames 0.19.3
2019-08-29 Add example how to compress/decompress CSV file using CodecZlib
2019-08-30 Add examples of JLSO.jl and ZipFile.jl by xiaodaigh
2019-11-03 Add examples of JDF.jl by xiaodaigh
2019-12-08 Updated to DataFrames 0.20.0
2020-05-06 Updated to DataFrames 0.21.0 (except load/save and extras)
2020-11-20 Updated to DataFrames 0.22.0 (except DataFramesMeta.jl which does not work yet)
2020-11-26 Updated to DataFramesMeta.jl 0.6; update by @pdeffebach

Core functions summary

  1. Constructors: DataFrame, DataFrame!, Tables.rowtable, Tables.columntable, Matrix, eachcol, eachrow, Tables.namedtupleiterator, empty, empty!
  2. Getting summary: size, nrow, ncol, describe, names, eltypes, first, last, getindex, setindex!, @view, isapprox
  3. Handling missing: missing (singleton instance of Missing), ismissing, nonmissingtype, skipmissing, replace, replace!, coalesce, allowmissing, disallowmissing, allowmissing!, completecases, dropmissing, dropmissing!, disallowmissing, disallowmissing!, passmissing
  4. Loading and saving: CSV (package), CSVFiles (package), Serialization (module), CSV.read, CSV.write, save, load, serialize, deserialize, Arrow.write, Arrow.Table (from Arrow.jl package), JSONTables (package), arraytable, objecttable, jsontable, CodecZlib (module), GzipCompressorStream, GzipDecompressorStream, JDF.jl (package), JDF.savejdf, JDF.loadjdf, JLSO.jl (package), JLSO.save, JLSO.load, ZipFile.jl (package), ZipFile.reader, ZipFile.writer, ZipFile.addfile
  5. Working with columns: rename, rename!, hcat, insertcols!, categorical!, columnindex, hasproperty, select, select!, transform, transform!, combine, Not, All, Between, ByRow, AsTable
  6. Working with rows: sort!, sort, issorted, append!, vcat, push!, view, filter, filter!, delete!, unique, nonunique, unique!, repeat, parent, parentindices, flatten, @pipe (from Pipe package), only
  7. Working with categorical: categorical, cut, isordered, ordered!, levels, unique, levels!, droplevels!, get, recode, recode!
  8. Joining: innerjoin, leftjoin, rightjoin, outerjoin, semijoin, antijoin, crossjoin
  9. Reshaping: stack, unstack
  10. Transforming: groupby, mapcols, parent, groupcols, valuecols, groupindices, keys (for GroupedDataFrame), combine, select, select!, transform, transform!, @pipe (from Pipe package)
  11. Extras:
    • FreqTables: freqtable, prop, Name
    • DataFramesMeta: @with, @where, @select, @transform, @orderby, @linq, @by, @combine, @eachrow, @newcol, ^, cols
    • StatsPlots: @df, plot, density, histogram,boxplot, violin
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].