All Projects → organisciak → Text Mining Course

organisciak / Text Mining Course

Course Notes for Text Mining - Prof. Peter Organisciak

Projects that are alternatives of or similar to Text Mining Course

Data Science Blogs
A Handful of D(u)S(t)
Stars: ✭ 92 (-1.08%)
Mutual labels:  jupyter-notebook
Tutorials
All of the code for my Medium articles
Stars: ✭ 92 (-1.08%)
Mutual labels:  jupyter-notebook
Cirrus
Serverless ML Framework
Stars: ✭ 93 (+0%)
Mutual labels:  jupyter-notebook
Programming Collective Intelligence
《集体智慧编程》Python代码(基于Python3.6)和数据集
Stars: ✭ 92 (-1.08%)
Mutual labels:  jupyter-notebook
Ai for everyone
Stars: ✭ 93 (+0%)
Mutual labels:  jupyter-notebook
Qatm
Code for Quality-Aware Template Matching for Deep Learning
Stars: ✭ 93 (+0%)
Mutual labels:  jupyter-notebook
Carnd Object Detection Lab
Stars: ✭ 92 (-1.08%)
Mutual labels:  jupyter-notebook
Classification Of Hyperspectral Image
Classification of the Hyperspectral Image Indian Pines with Convolutional Neural Network
Stars: ✭ 93 (+0%)
Mutual labels:  jupyter-notebook
Stepik Dl Nlp
Материалы мини-курса на Stepik "Нейронные сети и обработка текста"
Stars: ✭ 93 (+0%)
Mutual labels:  jupyter-notebook
Prob mbrl
A library of probabilistic model based RL algorithms in pytorch
Stars: ✭ 93 (+0%)
Mutual labels:  jupyter-notebook
Cognoma
Putting machine learning in the hands of cancer biologists
Stars: ✭ 92 (-1.08%)
Mutual labels:  jupyter-notebook
Ds With Pysimplegui
Data science and Machine Learning GUI programs/ desktop apps with PySimpleGUI package
Stars: ✭ 93 (+0%)
Mutual labels:  jupyter-notebook
Dvrl
Deep Variational Reinforcement Learning
Stars: ✭ 93 (+0%)
Mutual labels:  jupyter-notebook
Resnet cnn mri adni
Code for Residual and Plain Convolutional Neural Networks for 3D Brain MRI Classification paper
Stars: ✭ 92 (-1.08%)
Mutual labels:  jupyter-notebook
Doc Browser
A documentation browser with support for DevDocs, Dash and Hoogle, written in Haskell and QML
Stars: ✭ 93 (+0%)
Mutual labels:  jupyter-notebook
Sprint gan
Privacy-preserving generative deep neural networks support clinical data sharing
Stars: ✭ 92 (-1.08%)
Mutual labels:  jupyter-notebook
Zeroshotknowledgetransfer
Accompanying code for the paper "Zero-shot Knowledge Transfer via Adversarial Belief Matching"
Stars: ✭ 93 (+0%)
Mutual labels:  jupyter-notebook
Tutorials
Tutorials on optimization and coding skills
Stars: ✭ 93 (+0%)
Mutual labels:  jupyter-notebook
Satellite imagery python
Sample sample scripts and notebooks on processing satellite imagery
Stars: ✭ 93 (+0%)
Mutual labels:  jupyter-notebook
Fakeimagedetector
Image Tampering Detection using ELA and CNN
Stars: ✭ 93 (+0%)
Mutual labels:  jupyter-notebook

Text Mining

Assignments | Lab Worksheets | Syllabus

Overview

This course introduces students to the knowledge discovery process and methods used to mine patterns from a collection of text. We will critically review text mining methods developed in the knowledge discovery and databases, information science, and computational linguistics communities. Students will develop proficiency with modeling text through individual projects.

How can computers read? When we look at a paragraph of text, we have a set of skills to understand and interpret it: what is the message? Is it an argument? What is the sentiment? Computers don't have the same context or literacy. Their language is quantitative. Through text mining, this course will equip you with the skills to use understanding text through computing.

Text mining is most useful in the new affordances that it allows. In most cases, the tools of text mining aren't meant to replace 'close reading'; they give us new ways to ask questions - about literature, news, scholarship, correspondence, etc. - and are best applied in service of that novelty. Computing allows for:

  • Scale: Computers compare poorly to us in their ability to interpret meaning, but the things they can do may be applied to enormous scales. If you're interested in hundreds of books, thousands or web pages, or millions of tweets, simply reading them is unfeasible.
  • Re-contextualization: With text mining, you take apart texts and put them together in new ways. These give you new ways to understand information in a text or appreciate a book. Likewise, breaking down text to data also provides new comparative or critical tools. For example, we can understand what makes Jane Austen's books different from her contemporaries, or attribute authorship for anonymous or pseudonymous writing.
  • Summarization: Aggregation, extraction, and visualization all serve to report patterns you. For example, text summarization models can extract the takeaway points from a set of medical literature. A few final notes on course philosophy.

First, the broad view of text mining can encompass many disciplinary approaches. This course hews closely to the sub-area referred to as text analysis, intended to treat text mining in the services of qualitative questions. This is closest to the treatments in the digital humanities and computational social sciences.

For this course, you will be expected to learn new programming skills. Note that this is not a programming course. We will cover a subset of skills in Python that pertain to data science. Most of the time, your needs will be served by tinkering with and modifying code examples that I provide for you.

I understand the time constraints of being a student. To account for the time you will spend in this course learning new tools and writing code, I have tried to keep reading and writing loads reasonable.

Succeeding in this course will be through many little steps. The assignments are small but frequent. If you are looking at the entire outline of ideas and skills in this course, it may look overwhelming. However, going one step at a time, learning the language of text mining won't be scary.

Pre- and Co-requisites

An introductory level database and programming course or permission of the instructor.

Required Texts

This course incorporated readings from a variety of sources. Readings will openly accessible and posted on/linked from the course website. In addition to individual essays and papers, we will also return repeatedly to the following texts:

Schedule

  • Week 1: Introduction
  • Week 2: Fundamentals
  • Week 3: Features
  • Week 4: Text Mining for Art and Criticism
  • Week 5: Documentation Access; Natural Language Processing 1 - Part of Speech Tagging
  • Week 6: Natural Language Processing 2 - Information Extraction and Dependency Parsing
  • Week 7: Classification 1
  • Week 8: Classification 2
  • Week 9: Clustering
  • Week 10: Topic Modeling and Dimensionality Reduction 1
  • Week 11:Topic Modelling 2; Sentiment Analysis
  • Week 12: Visualization
  • Week 13: Word Embeddings
  • Week 14: What's Next: Remainder Notes from Text Mining

The week-to-week syllabus, with readings, slides, and schedule notes is on the Syllabus page.

Assignments

  • 30% Lab Tasks - Due Weekly
  • 20% Small Assigments
    • 10% - Twitter Bot Assignment
    • 10% - Topic Modelling Assignment
  • 35% Text Mining Project
  • 5% Problem Statement
  • 5% Literature review + 5% Data collection
  • 20% Final report
  • 15% Participation
  • 5% Attendance
  • 10% Forum posts, comments, class engagement

Details are on the Assignments page.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].