All Projects → corollari → Linusrants

corollari / Linusrants

Dataset of Linus Torvalds' rants classified by negativity using sentiment analysis

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Linusrants

Pytreebank
😡😇 Stanford Sentiment Treebank loader in Python
Stars: ✭ 93 (-68.04%)
Mutual labels:  dataset, sentiment-analysis
Awesome Twitter Data
A list of Twitter datasets and related resources.
Stars: ✭ 533 (+83.16%)
Mutual labels:  dataset, sentiment-analysis
Indonesian Nlp Resources
data resource untuk NLP bahasa indonesia
Stars: ✭ 143 (-50.86%)
Mutual labels:  dataset, sentiment-analysis
French Sentiment Analysis Dataset
A collection of over 1.5 Million tweets data translated to French, with their sentiment.
Stars: ✭ 35 (-87.97%)
Mutual labels:  dataset, sentiment-analysis
Nlp bahasa resources
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Stars: ✭ 158 (-45.7%)
Mutual labels:  dataset, sentiment-analysis
Exclusively Dark Image Dataset
Exclusively Dark (ExDARK) dataset which to the best of our knowledge, is the largest collection of low-light images taken in very low-light environments to twilight (i.e 10 different conditions) to-date with image class and object level annotations.
Stars: ✭ 274 (-5.84%)
Mutual labels:  dataset
Tehran Stocks
A python package to access tsetmc data
Stars: ✭ 282 (-3.09%)
Mutual labels:  dataset
Semantic Kitti Api
SemanticKITTI API for visualizing dataset, processing data, and evaluating results.
Stars: ✭ 272 (-6.53%)
Mutual labels:  dataset
Twitter Sent Dnn
Deep Neural Network for Sentiment Analysis on Twitter
Stars: ✭ 270 (-7.22%)
Mutual labels:  sentiment-analysis
Datascience course
Curso de Data Science em Português
Stars: ✭ 294 (+1.03%)
Mutual labels:  dataset
Text2sql Data
A collection of datasets that pair questions with SQL queries.
Stars: ✭ 287 (-1.37%)
Mutual labels:  dataset
Meglass
An eyeglass face dataset collected and cleaned for face recognition evaluation, CCBR 2018.
Stars: ✭ 281 (-3.44%)
Mutual labels:  dataset
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (-6.19%)
Mutual labels:  dataset
Realsr
Toward Real-World Single Image Super-Resolution: A New Benchmark and A New Model (ICCV 2019)
Stars: ✭ 282 (-3.09%)
Mutual labels:  dataset
Anime Face Dataset
🖼 A collection of high-quality anime faces.
Stars: ✭ 272 (-6.53%)
Mutual labels:  dataset
Bert For Rrc Absa
code for our NAACL 2019 paper: "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis"
Stars: ✭ 286 (-1.72%)
Mutual labels:  sentiment-analysis
Covid19canada
Epidemiological Data from the COVID-19 Epidemic in Canada
Stars: ✭ 272 (-6.53%)
Mutual labels:  dataset
Dstc8 Schema Guided Dialogue
The Schema-Guided Dialogue Dataset
Stars: ✭ 277 (-4.81%)
Mutual labels:  dataset
Bertweet
BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)
Stars: ✭ 282 (-3.09%)
Mutual labels:  sentiment-analysis
Fx 1 Minute Data
HISTDATA - Full Dataset composed of 68 FX trading pairs / Simple API to retrieve 1 Minute data Historical FX Prices (up to June 2019).
Stars: ✭ 278 (-4.47%)
Mutual labels:  dataset

linusrants

Just a collection of all the rants from Linus Torvalds on the kernel mailing list from 2012 to 2015 classified by the amount of hate and sorted by it.

Data

Rant Hate
Kay, this needs to be fixed. ... Of course, I'd also suggest that whoever was the genius who thought it was a good idea to read things ONE F*CKING BYTE AT A TIME with system calls for each byte should be retroactively aborted. Who the f*ck does idiotic things like that? How did they noty die as babies, considering that they were likely too stupid to find a tit to suck on? 0.621010024824
So get your act together, and push back on the people you are supposed to manage. Because this is not acceptable for post-rc5, and I'm giving this single warning. Next time, I'll just ignore the sh*t you send me. Comprende? 0.701378112737
What the F*CK, guys? This piece-of-shit commit is marked for stable, but you clearly never even test-compiled it, did you? ...The declaration for gate_desc is very very different for 32-bit and 64-bit x86 for whatever braindamaged reasons. Seriously, WTF? I made the mistake of doing multiple merges back-to-back with the intention of not doing a full allmodconfig build in between them, and now I have to undo them all because this pull request was full of unbelievable shit. And why the hell was this marked for stable even IF it hadn't been complete and utter tripe? It even has a comment in the commit message about how this probably doesn't matter. So it's doubly crap: it's wrong, and it didn't actually fix anything to begin with. There aren't enough swear-words in the English language, so now I'll have to call you perkeleen vittupää just to express my disgust and frustration with this crap. 0.813954024912
Stop this idiotic "blame gcc bug" crap. Which part of my explanation for why it was NOT a compiler bug did you not understand? ... Stop the f*cking around already! The whole "we expect ww_ctx to be null" thing shows that YOU DO NOT SEEM TO UNDERSTAND WHAT THE TEST ACTUALLY IS! ... Christ, can you really not understand that? NO NO NO NO. No a f*cking thousand times. It's not "too broken in gcc". It's too broken in the source code, and the fact that you don't even understand that is sad. You wrote the code, and you seem to be unable to admit that your code was buggy. It's not a compiler bug. It's your bug. Stand up like a man, instead of trying to flail around and blame anything else but yourself. So guys, get your act together, and stop blaming the compiler already. 0.92104254021

If you're interested in the complete list of rants, it is available in table form, json format and pickle format.

If you'd like to do some analysis or plotting of the data, the best resource for that is the full dataset in tsv format, which includes a ton of metadata about each rant. Here's an extract of the data that you will find there:

hate markmail permalink type(code, personal, both, unsure) date timestamp mail excerpt
0.6210100248237 http://markmail.org/message/qwv74fvg3pqnk4eo B 06/07/12 13:29 1341574178 Kay, this needs to be fixed...
0.701378112736595 http://markmail.org/message/zh5hmrt35h7aw7qp C 15/10/13 10:47 1381826864 So get your act together...
0.813954024912302 http://markmail.org/message/6h3gtqeizljyjqio C 13/07/13 15:40 1373722801 What the F*CK, guys?...
0.921042540209971 http://markmail.org/message/gxmmhefjyijrucsd B 27/10/13 12:22 1382872960 Stop this idiotic...

Analysis

Plot of hate levels vs time (x axis is not properly scaled):

Build

To build it yourself just run:

python classify.py

Essentially, all I did was take a dataset of Linus rants already available[1] and send it through a sentiment analysis API[2], aggregating and sorting the results.


[1] Original raw dataset of Linus Torvalds rants can be found at https://data.world/jboutros/linus-rants

[2] http://text-processing.com/docs/sentiment.html

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].