All Projects → SuperCowPowers → Data_hacking

SuperCowPowers / Data_hacking

Licence: mit
Data Hacking Project

Projects that are alternatives of or similar to Data hacking

Featexp
Feature exploration for supervised learning
Stars: ✭ 688 (-2.41%)
Mutual labels:  jupyter-notebook
Madewithml
Learn how to responsibly deliver value with ML.
Stars: ✭ 29,253 (+4049.36%)
Mutual labels:  jupyter-notebook
Ai Series
📚 [.md & .ipynb] Series of Artificial Intelligence & Deep Learning, including Mathematics Fundamentals, Python Practices, NLP Application, etc. 💫 人工智能与深度学习实战,数理统计篇 | 机器学习篇 | 深度学习篇 | 自然语言处理篇 | 工具实践 Scikit & Tensoflow & PyTorch 篇 | 行业应用 & 课程笔记
Stars: ✭ 702 (-0.43%)
Mutual labels:  jupyter-notebook
Opencv Machine Learning
M. Beyeler (2017). Machine Learning for OpenCV: Intelligent image processing with Python. Packt Publishing Ltd., ISBN 978-178398028-4.
Stars: ✭ 693 (-1.7%)
Mutual labels:  jupyter-notebook
Csp
High-level Semantic Feature Detection: A New Perspective for Pedestrian Detection, CVPR, 2019
Stars: ✭ 695 (-1.42%)
Mutual labels:  jupyter-notebook
Stockpriceprediction
Stock Price Prediction using Machine Learning Techniques
Stars: ✭ 700 (-0.71%)
Mutual labels:  jupyter-notebook
Ntu Machine Learning
台湾大学李宏毅老师机器学习
Stars: ✭ 684 (-2.98%)
Mutual labels:  jupyter-notebook
Cookbook 2nd
IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018
Stars: ✭ 704 (-0.14%)
Mutual labels:  jupyter-notebook
H1st
The AI Application Platform We All Need. Human AND Machine Intelligence. Based on experience building AI solutions at Panasonic: robotics predictive maintenance, cold-chain energy optimization, Gigafactory battery mfg, avionics, automotive cybersecurity, and more.
Stars: ✭ 697 (-1.13%)
Mutual labels:  jupyter-notebook
Panama Papers Dataset 2016
Structured data about Panama papers collected from official ICIJ website
Stars: ✭ 701 (-0.57%)
Mutual labels:  jupyter-notebook
Deepbayes 2019
Practical assignments of the Deep|Bayes summer school 2019
Stars: ✭ 694 (-1.56%)
Mutual labels:  jupyter-notebook
Sciblog support
Support content for my blog
Stars: ✭ 694 (-1.56%)
Mutual labels:  jupyter-notebook
Caffenet Benchmark
Evaluation of the CNN design choices performance on ImageNet-2012.
Stars: ✭ 700 (-0.71%)
Mutual labels:  jupyter-notebook
Pytorch Segmentation Detection
Image Segmentation and Object Detection in Pytorch
Stars: ✭ 692 (-1.84%)
Mutual labels:  jupyter-notebook
Intro To Dl
Resources for "Introduction to Deep Learning" course.
Stars: ✭ 703 (-0.28%)
Mutual labels:  jupyter-notebook
Pytorch Multi Style Transfer
Neural Style and MSG-Net
Stars: ✭ 687 (-2.55%)
Mutual labels:  jupyter-notebook
Analytics Handbook
Getting started with soccer analytics
Stars: ✭ 699 (-0.85%)
Mutual labels:  jupyter-notebook
Machine Learning
머신러닝 입문자 혹은 스터디를 준비하시는 분들에게 도움이 되고자 만든 repository입니다. (This repository is intented for helping whom are interested in machine learning study)
Stars: ✭ 705 (+0%)
Mutual labels:  jupyter-notebook
Polyrnn Pp
Inference Code for Polygon-RNN++ (CVPR 2018)
Stars: ✭ 704 (-0.14%)
Mutual labels:  jupyter-notebook
Network Analysis Made Simple
An introduction to network analysis and applied graph theory using Python and NetworkX
Stars: ✭ 700 (-0.71%)
Mutual labels:  jupyter-notebook

data_hacking

Welcome to the Data Hacking Project

"Hacking in the sense of deconstructing an idea, hardware, anything and getting it to do something it wasn’t intended or to better understand how something works." (BSides CFP)

So hacking here means we want to quickly deconstruct data, understand what we've got and how to best utilize it for the problem at hand.

The primary motivation for these exercises is to explore the nexus of IPython, Pandas and Scikit Learn on security data of various kinds. The exercises will often intentionally show common missteps, warts in the data, paths that didn't work out that well and results that could definitely be improved upon. In general we're trying to capture what worked and what didn't, not only is that more realistic but often much more informative to the reader. :)

Python Modules Used:

  • IPython: Architecture for interactive computing and presentation
  • Pandas: Python Data Analysis Library
  • Scikit Learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
  • Matplotlib: Python 2D plotting library

Exercises:

Related Notebooks

Setup:

  • Required packages:

    • Brew/apt-get
      • graphviz, freetype, zmq
    • Python
      • ipython, pygraphviz, pandas, matplotlib, networkx, pyzmq, jinja2, scipy, patsy, statsmodels, pefile, macholib
  • Some of the exercises use packages from the data_hacking repository, to install those packages into your python site packages:

     %> sudo python setup.py install
  
  • To uninstall:
     %> sudo pip uninstall data_hacking
  

Install IPython:

There's quite a bit of google results for this, we actually have mixed feelings about the IPython install instructions on the IPython page. The directions work but it directs you to download and install Anaconda or the free edition of Enthought Canopy. Both of these are prepackaged python distributions with a bunch of stuff like Numpy, Scipy, IPython, Matplotlib, Pandas, ... occasionally these will have a hitch and then you might be a bit SOL because StackOverflow is going to say 'WTF are those things? Just do '$pip install blah' or '$brew install blah'.

So we recommend you be brave and do it the normal way... in particular this guy seems to have a pretty good write up for Mac installs:

Running the Notebooks:

Most of the notebooks will have relative paths to some resources, data files or images. In general the easiest way we found to run ipython on the notebooks is to change into that project directory and run ipython with this alias (put in your .bashrc or whatever):

alias ipython='ipython notebook --FileNotebookManager.notebook_dir=`pwd`'
$ cd data_hacking/fun_with_syslog
$ ipython (as aliased above)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].