Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → KangboLu → Uc Davis Cs Exams Analysis

KangboLu / Uc Davis Cs Exams Analysis

Licence: mit

📈 Regression and Classification with UC Davis student quiz data and exam data

Programming Languages

7636 projects

Labels

machine-learning nlp testing statistics regex unsupervised-learning training text-mining web-scraping logistic-regression linear-regression probability statistical-analysis

Projects that are alternatives of or similar to Uc Davis Cs Exams Analysis

Dat8

General Assembly's 2015 Data Science course in Washington, DC

Stars: ✭ 1,516 (+4493.94%)

Mutual labels: web-scraping, logistic-regression, linear-regression

Data Science Toolkit

Collection of stats, modeling, and data science tools in Python and R.

Stars: ✭ 169 (+412.12%)

Mutual labels: statistics, logistic-regression, statistical-analysis

25daysinmachinelearning

I will update this repository to learn Machine learning with python with statistics content and materials

Stars: ✭ 53 (+60.61%)

Mutual labels: statistics, logistic-regression, linear-regression

srqm

An introductory statistics course for social scientists, using Stata

Stars: ✭ 43 (+30.3%)

Mutual labels: linear-regression, statistical-analysis, logistic-regression

Python For Probability Statistics And Machine Learning

Jupyter Notebooks for Springer book "Python for Probability, Statistics, and Machine Learning"

Stars: ✭ 481 (+1357.58%)

Mutual labels: statistics, probability, statistical-analysis

machine-learning-course

Machine Learning Course @ Santa Clara University

Stars: ✭ 17 (-48.48%)

Mutual labels: linear-regression, logistic-regression, unsupervised-learning

Machine Learning With Python

Python code for common Machine Learning Algorithms

Stars: ✭ 3,334 (+10003.03%)

Mutual labels: logistic-regression, linear-regression

2018 Machinelearning Lectures Esa

Machine Learning Lectures at the European Space Agency (ESA) in 2018

Stars: ✭ 280 (+748.48%)

Mutual labels: text-mining, linear-regression

Stats

A C++ header-only library of statistical distribution functions.

Stars: ✭ 292 (+784.85%)

Mutual labels: statistics, probability

Stats Maths With Python

General statistics, mathematical programming, and numerical/scientific computing scripts and notebooks in Python

Stars: ✭ 381 (+1054.55%)

Mutual labels: statistics, probability

Text-Analysis

Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.

Stars: ✭ 48 (+45.45%)

Mutual labels: text-mining, web-scraping

Basic Mathematics For Machine Learning

The motive behind Creating this repo is to feel the fear of mathematics and do what ever you want to do in Machine Learning , Deep Learning and other fields of AI

Stars: ✭ 300 (+809.09%)

Mutual labels: statistics, probability

Teaching

Teaching Materials for Dr. Waleed A. Yousef

Stars: ✭ 435 (+1218.18%)

Mutual labels: statistics, probability

Expan

Open-source Python library for statistical analysis of randomised control trials (A/B tests)

Stars: ✭ 275 (+733.33%)

Mutual labels: statistics, statistical-analysis

Shendusuipian

To know stats by heart

Stars: ✭ 275 (+733.33%)

Mutual labels: statistics, probability

Fuku Ml

Simple machine learning library / 簡單易用的機器學習套件

Stars: ✭ 280 (+748.48%)

Mutual labels: logistic-regression, linear-regression

Probability Theory

A quick introduction to all most important concepts of Probability Theory, only freshman level of mathematics needed as prerequisite.

Stars: ✭ 25 (-24.24%)

Mutual labels: statistics, probability

Machine learning basics

Plain python implementations of basic machine learning algorithms

Stars: ✭ 3,557 (+10678.79%)

Mutual labels: logistic-regression, linear-regression

Tensorflow Book

Accompanying source code for Machine Learning with TensorFlow. Refer to the book for step-by-step explanations.

Stars: ✭ 4,448 (+13378.79%)

Mutual labels: logistic-regression, linear-regression

Bagofconcepts

Python implementation of bag-of-concepts

Stars: ✭ 18 (-45.45%)

Mutual labels: unsupervised-learning, text-mining

View All Similar Projects ➔

Probabilistic and Statistical Modeling Project

Detailed Description:

Click Here

First release of 132 term project:

ProblemA.R accomplished statistical analysis on ECS132, ECS145, and ECS154 students' quiz average from University of California at Davis.
ProblemB.R gathered, cleaned, and organized training data (previous exams) into document term matrix for creating 9 logistic models for 9 courses. The 9 models are used to predict which course does the test data (exam) belongs to.

Task A

Here you will do some statistical analysis on my undergrad quiz data, with a Description goal. Here are the details:

I have made available data on the following for each student:
- Course name. We will have ECS 132, 145 and 158.
- Year/quarter offered. E.g. 2012.1 is Winter 2012, 2015.3 is Fall 2015. This data will be used to determine whether there has been some time trend in my quiz grades in recent years.
- Student major (CS, CSE only).
- Overall quiz average.
Please note: A very important part of your job will be to take the data in the form I provide it, and create one big R data frame, with columns 'course name', 'year offered', 'major' and 'quiz average'. Use R's read.table() or some other R function to read the original data from our Web site, then other R code to create the data frame and work with it. You are required to use R for all aspects of this, and explain in your report what you did in this regard.
Do the following analyses:
- Assuming no time trend, find approximate 95% confidence intervals for the population mean quiz average for each of the four courses. Comment.
- Assuming no time trend, find an approximate 95% confidence interval for the difference in population mean quiz averages for ECS 132 and 145. Comment.
- Assuming no time trend, find an approximate 95% confidence interval for the difference in population mean quiz averages in ECS 145 between the two majors. Comment.
- Fit a linear regression model in which quiz average is predicted from year, course and major. For the last two, create dummy variables (Sec. 21.12). Use this to determine whether there is a substantial time trend. Also use it to compare ECS 132 and 145, and CS majors to CSE. (This is different from above, because now we are adjusting for a possible time trend.)
- Do an analysis of your choice (justified!) that investigates whether there is a time trend in the proportion of CS majors in our department, based on this data.

Task B

Here you will do some predictive modeling (machine learning), involving text data. One active branch of this field is text classification, e.g. sentiment analysis. We will be less ambitious here, but the principles are the same. Here are the details.

The data consists of all files in my course Web page site with names of the form *1/Exams/tex , *2/Exams/tex or **50/Exams/tex . (Go into one directory level within *Exams).
As in Problem A, provide and explain your complete R code for fetching the data and for your analyses.
This will be a classification problem, as in Chapter 16. The classes here will be...classes! You will predict the class, i.e. one of ECS 50, 132, 145, 152A, 154A, 154B, 156, 158, 188 and 256 from the words present in an exam.
You will use the logistic model, fitting 10 logit models, one for each class. The predictor variables are counts of specific words. For a given new case, you plug the word counts into the logit function, giving you an estimated conditional probability of that class. Whichever class has the highest conditional probability, you guess this case to be in that class.
The criterion here is prediction accuracy: What proportion of new cases is predicted correctly? To simulate having new cases, it is customary to divide one's data into a training set and a test set. We fit our models to the training set, then predict the test test, pretending that we don't know the classes of the test set. We of course do know their classes, so we can evaluate the proportion of our predictions that come out correct. There are something like 293 exams in the above directories; you will choose 50 at random for your test set (sometimes called the holdout set).
You will use R's glm() function to fit the logit models, as in Chapter 11. Your data will consist of an R matrix or data frame, one row per exam in the training set. All but one of the columns will be word counts, with the remaining one being an indicator variable for the class of interest (1 for being in the class, 0 not).
A major issue is how to get the word counts. You will use R's tm package, which removes punctuation, white space etc. You can get counts from the output. You will decide what to remove and what not, including the issue of whether to remove the LaTeX keywords. There are lots of tutorials on tm on the Web. Explain your decision on this thoroughly in your report.
The other major issue is which words to use. This is hard. A rough rule of thumb is to use no more than sqrt(n) predictor variables, where n is the number of cases in the training set, thus no more than sqrt(n) word counts here. But which ones? Explain your decision on this thoroughly in your report.
This is another of those assignments in which you at first will have little or no idea as to what to do. Give it a lot of thought, and discuss it vigorously in your group. Your solution will gradually take shape. Of course, feel free to ask Robin or me if you get stuck and you are not sure about something.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 33

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗