All Projects → kevalmorabia97 → SEDTWik-Event-Detection-from-Tweets

kevalmorabia97 / SEDTWik-Event-Detection-from-Tweets

Licence: MIT license
Segmentation based event detection from Tweets. Published at NAACL SRW 2019

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to SEDTWik-Event-Detection-from-Tweets

Text-Classification-LSTMs-PyTorch
The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (-22.41%)
Mutual labels:  text-mining, tweets
TwEater
A Python Bot for Scraping Conversations from Twitter
Stars: ✭ 16 (-72.41%)
Mutual labels:  text-mining, tweets
nitter scraper
Scrape Twitter API without authentication using Nitter.
Stars: ✭ 31 (-46.55%)
Mutual labels:  tweets
tweet-delete-bot
A bot that deletes and un-favourites tweets that are more than 10 days old. Schedule this to run once a day to become an ephemeral tweep, just like http://twitter.com/JacksonBates
Stars: ✭ 39 (-32.76%)
Mutual labels:  tweets
tweet-delete
Self-destructing Tweets so you too can be cool 😎
Stars: ✭ 68 (+17.24%)
Mutual labels:  tweets
TeleTweet
🦉 A telegram Twitter bot that will allow you send tweets!
Stars: ✭ 34 (-41.38%)
Mutual labels:  tweets
Twitter-Sentiment-Analyzer
Twitter Sentiment Analyzer
Stars: ✭ 13 (-77.59%)
Mutual labels:  text-mining
Twitter Activated Crypto Trading Bot
Buys crypto through keyword detection in new tweets. Executes buy in 1 second and holds for a given time (e.g. Elon tweets 'doge', buys Dogecoin and sells after 5 minutes). Tested on Kraken and Binance exchanges
Stars: ✭ 92 (+58.62%)
Mutual labels:  tweets
odinson
Odinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time.
Stars: ✭ 59 (+1.72%)
Mutual labels:  text-mining
Udacity-Data-Analyst-Nanodegree
Repository for the projects needed to complete the Data Analyst Nanodegree.
Stars: ✭ 31 (-46.55%)
Mutual labels:  text-mining
SMMT
Social Media Mining Toolkit (SMMT) main repository
Stars: ✭ 116 (+100%)
Mutual labels:  tweets
deduce
Deduce: de-identification method for Dutch medical text
Stars: ✭ 40 (-31.03%)
Mutual labels:  text-mining
malay-dataset
Text corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html
Stars: ✭ 189 (+225.86%)
Mutual labels:  text-mining
covid19.swift
🌐 Small iOS app to show some COVID-19 health, data, news and tweets
Stars: ✭ 25 (-56.9%)
Mutual labels:  tweets
PubMed-Best-Match
Machine-learning based pipeline relying on LambdaMART currently used in PubMed for relevance (Best Match) searches
Stars: ✭ 36 (-37.93%)
Mutual labels:  text-mining
trumptweets
Download data on all of Donald Trump's (@RealDonaldTrump) tweets
Stars: ✭ 39 (-32.76%)
Mutual labels:  tweets
teanaps
자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Stars: ✭ 91 (+56.9%)
Mutual labels:  text-mining
block-twitter-promoted
Block promoted contents including tweets, trends, and, follows, hide more annoying contents, switch to Latest Tweets for home page on Twitter.
Stars: ✭ 25 (-56.9%)
Mutual labels:  tweets
arabic-sentiment-analysis
Sentiment Analysis in Arabic tweets
Stars: ✭ 64 (+10.34%)
Mutual labels:  tweets
command-line-tweeter
Tweets in from a pipe
Stars: ✭ 70 (+20.69%)
Mutual labels:  tweets

SEDTWik: Event Detection from Tweets

PWC

Implementation of my paper: SEDTWik: Segmentation-based Event Detection from Tweets using Wikipedia which is available here

SEDTWik Architecture

The process of Event detection can be divided into 4 parts:

1. Tweet Segmentation

Split a given tweet into non-overlapping meaningful segments, giving more weight to hashtags (𝐻). Filter out words not present as a Wikipedia page title.

Tweet Segmentation (with 𝐻 = 3)
Joe Biden and Paul Ryan will be seated at the debate tonight #VpDebate [joe biden], [paul ryan], [seated], [debate], [tonight], [vp debate]x3
Amanda Todd took her own life due to cyber bullying #RipAmandaTodd #NoMoreBullying [amanda todd], [cyber bullying], [rip amanda todd]x3, [no more bullying]x3

2. Bursty Segment Extraction

Score segments based on their bursty probability (𝑃𝑏), and follower count (𝑓𝑐), retweet count (𝑟𝑐), and count of unique users using them (𝑢). Select top 𝐾=√(𝑁𝑡 ) segments based on 𝑆𝑐𝑜𝑟𝑒 (𝑁𝑡 = total number of tweets in current time window).

𝑃𝑏(s) measures how frequent a segment is occurring compared to its expected probability of occurrence.

Scores = 𝑃b(𝑠) × log⁡(𝑢s) × log(rcs) × log⁡(log⁡(𝑓𝑐s)).

3. Bursty Segment Clustering

Variation of Jarvis-Patrick Clustering algorithm.

Segments considered as nodes in a graph and 2 segments belong to same cluster if both are in 𝑘-NN of each other.

Segment similarity: 𝑡𝑓−𝑖𝑑𝑓 similarity between contents of tweets containing the segment.

After creating candidate Event clusters, discard those that have newsworthiness value beyond a threshold.

4. Event Summarization

Use all tweets containing segments of the event cluster and apply any text summarization algorithm to them to obtain a summary of the event.

Event Summarization in itself is a big research area and many sophisticated methods are available to summarize text.

A simple way to do this is by using LexRank (Extractive Text Sumsmarization) algorithm.

We leave this part to the user to use any appropriate Summarization method.

Cite

@inproceedings{morabia-etal-2019-sedtwik,
    title = "{SEDTW}ik: Segmentation-based Event Detection from Tweets Using {W}ikipedia",
    author = "Morabia, Keval  and
      Bhanu Murthy, Neti Lalita  and
      Malapati, Aruna  and
      Samant, Surender",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Student Research Workshop",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/N19-3011",
    pages = "77--85",
}

Abstract:
Event Detection has been one of the research areas in Text Mining that has attracted attention during this decade due to the widespread availability of social media data specifically twitter data. Twitter has become a major source for information about real-world events because of the use of hashtags and the small word limit of Twitter that ensures concise presentation of events. Previous works on event detection from tweets are either applicable to detect localized events or breaking news only or miss out on many important events. This paper presents the problems associated with event detection from tweets and a tweet-segmentation based system for event detection called SEDTWik, an extension to a previous work, that is able to detect newsworthy events occurring at different locations of the world from a wide range of categories. The main idea is to split each tweet and hash-tag into segments, extract bursty segments, cluster them, and summarize them. We evaluated our results on the well-known Events2012 corpus and achieved state-of-the-art results

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].