All Projects → cgivre → data-exploration-with-apache-drill

cgivre / data-exploration-with-apache-drill

Licence: other
Data Exploration with Apache Drill

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to data-exploration-with-apache-drill

sugarcube
Monoidal data processes.
Stars: ✭ 32 (+28%)
Mutual labels:  data-mining
Heart disease prediction
Heart Disease prediction using 5 algorithms
Stars: ✭ 43 (+72%)
Mutual labels:  data-mining
awesome-Python-data-science-books
Probably the best curated list of data science books in Python
Stars: ✭ 331 (+1224%)
Mutual labels:  data-mining
Asclepius
Open Price Comparison for US Hospitals
Stars: ✭ 20 (-20%)
Mutual labels:  data-mining
perke
A keyphrase extractor for Persian
Stars: ✭ 60 (+140%)
Mutual labels:  data-mining
MetQy
Repository for R package MetQy (read related publication here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6247936/)
Stars: ✭ 17 (-32%)
Mutual labels:  data-mining
scikit-hubness
A Python package for hubness analysis and high-dimensional data mining
Stars: ✭ 41 (+64%)
Mutual labels:  data-mining
KaliIntelligenceSuite
Kali Intelligence Suite (KIS) shall aid in the fast, autonomous, central, and comprehensive collection of intelligence by executing standard penetration testing tools. The collected data is internally stored in a structured manner to allow the fast identification and visualisation of the collected information.
Stars: ✭ 58 (+132%)
Mutual labels:  data-mining
boxball
Prebuilt Docker images with Retrosheet's complete baseball history data for many analytical frameworks. Includes Postgres, cstore_fdw, MySQL, SQLite, Clickhouse, Drill, Parquet, and CSV.
Stars: ✭ 79 (+216%)
Mutual labels:  apache-drill
corpusexplorer2.0
Korpuslinguistik war noch nie so einfach...
Stars: ✭ 16 (-36%)
Mutual labels:  data-mining
Data-Mining-on-Social-Media
Python scripts to extract tweets and facebook posts from public users.
Stars: ✭ 99 (+296%)
Mutual labels:  data-mining
EasyMiner
Easy association rule mining and classification on the web
Stars: ✭ 14 (-44%)
Mutual labels:  data-mining
dh-core
Functional data science
Stars: ✭ 123 (+392%)
Mutual labels:  data-mining
imbalanced-ensemble
Class-imbalanced / Long-tailed ensemble learning in Python. Modular, flexible, and extensible. | 模块化、灵活、易扩展的类别不平衡/长尾机器学习库
Stars: ✭ 199 (+696%)
Mutual labels:  data-mining
scikit-cycling
Tools to analyze cycling data
Stars: ✭ 25 (+0%)
Mutual labels:  data-mining
Semantic-Bus
object flow treatment, data transformation
Stars: ✭ 49 (+96%)
Mutual labels:  data-mining
TextClassification
基于scikit-learn实现对新浪新闻的文本分类,数据集为100w篇文档,总计10类,测试集与训练集1:1划分。分类算法采用SVM和Bayes,其中Bayes作为baseline。
Stars: ✭ 86 (+244%)
Mutual labels:  data-mining
sciblox
sciblox - Easier Data Science and Machine Learning
Stars: ✭ 48 (+92%)
Mutual labels:  data-mining
hierarchical-clustering
A Python implementation of divisive and hierarchical clustering algorithms. The algorithms were tested on the Human Gene DNA Sequence dataset and dendrograms were plotted.
Stars: ✭ 62 (+148%)
Mutual labels:  data-mining
conferencias matutinas amlo
CSVs de las versiones estenográficas de las conferencias matutinas del Presidente Andres Manuel López Obrador ( Mañaneras AMLO )
Stars: ✭ 25 (+0%)
Mutual labels:  data-mining

oreilly-logo

Data Exploration with Apache Drill

Prerequisites

Drill uses standard ANSI SQL as its main user interface. This course focuses on how to use Drill and not the details of SQL syntax so having a good understanding of SQL will help you with this class. If you are looking for a good tutorial, I recommend SQL for Mere Mortals by John Viescas and Michael J. Hernandez which is available here: http://amzn.to/2lfhXNL.

Required Software

For this class, I will provide you with a virtual machine with Drill and all the required software pre-configured. I highly recommend using the virtual machine. The virtual machine needs 8GB of RAM and 30GB of hard disk space. You will also need to:

If you are not using the VM, you will need to install Drill on your local machine.

Links

Instructor

Charles Givre has always been interested solving problems in unique ways, and has worked to make a career of it as a data scientist at Booz Allen Hamilton. At Booz Allen, Mr. Givre worked as a technical leader on various large government projects. Mr. Givre enjoys sharing his passion for data science with others and has worked to develop comprehensive data science training programs at his firm. Prior to joining Booz Allen, Mr. Givre worked as a counterterrorism analyst at the Central Intelligence Agency for nearly five years.

Mr. Givre got interested in Apache Drill several years ago, and is co-author of the first O’Reilly book about Drill. He has delivered numerous workshops about Drill and has contributed to the codebase. Mr. Givre is a sought-after speaker and has delivered training and talks at international conferences such as BlackHat, Strata + Hadoop World, Open Data Science Conference (ODSC) and others. Mr. Givre holds a Master of Arts from Brandeis University in Middle Eastern Studies, a Bachelor of Science in Computer Science and a Bachelor of Music both from the University of Arizona. Mr. Givre also holds a CISSP, Security+ and various other certifications. Mr. Givre blogs at thedataist.com and in his non-existant spare time, Mr. Givre enjoys spending time with his family and restoring classic cars.

Code of Conduct

Since this is an official O'Reilly Training, we will adhere to the O'Reilly conferences Code of Conduct.

"At O'Reilly, we assume that most people are intelligent and well-intended, and we're not inclined to tell people what to do. However, we want every O'Reilly conference to be a safe and productive environment for everyone. To that end, this code of conduct spells out the behavior we support and don't support at conferences."

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].