koheiw / workshop-IJTA

Licence: other

Rによる日本語テキスト分析入門

Programming Languages

7636 projects

Projects that are alternatives of or similar to workshop-IJTA

LSX

A word embeddings-based semi-supervised model for document scaling

Stars: ✭ 42 (+68%)

Mutual labels: text-analysis, quanteda

quanteda.corpora

A collection of corpora for quanteda

Stars: ✭ 17 (-32%)

Mutual labels: text-analysis, quanteda

Smltar

Manuscript of the book "Supervised Machine Learning for Text Analysis in R" by Emil Hvitfeldt and Julia Silge

Stars: ✭ 125 (+400%)

Mutual labels: text-analysis

jmdict-simplified

JMdict, JMnedict, Kanjidic, KRADFILE/RADKFILE in JSON format

Stars: ✭ 96 (+284%)

Mutual labels: japanese-language

Woke

✊ Detect non-inclusive language in your source code.

Stars: ✭ 190 (+660%)

Mutual labels: text-analysis

Qdap

Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis

Stars: ✭ 146 (+484%)

Mutual labels: text-analysis

Shifterator

Interpretable data visualizations for understanding how texts differ at the word level

Stars: ✭ 209 (+736%)

Mutual labels: text-analysis

Ml Dl Scripts

The repository provides usefull python scripts for ML and data analysis

Stars: ✭ 119 (+376%)

Mutual labels: text-analysis

Convert-Numbers-to-Japanese

Converts Arabic numerals, or 'western' style numbers, to a Japanese context.

Stars: ✭ 33 (+32%)

Mutual labels: japanese-language

Textvec

Text vectorization tool to outperform TFIDF for classification tasks

Stars: ✭ 167 (+568%)

Mutual labels: text-analysis

OleanderStemmingLibrary

Porter stemming library (C++)

Stars: ✭ 37 (+48%)

Mutual labels: text-analysis

Textclean

Tools for cleaning and normalizing text data

Stars: ✭ 159 (+536%)

Mutual labels: text-analysis

Wikitextparser

A simple WikiText parsing library for MediaWiki

Stars: ✭ 149 (+496%)

Mutual labels: text-analysis

wordhoard

This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.

Stars: ✭ 78 (+212%)

Mutual labels: text-analysis

Stanza Old

Stanford NLP group's shared Python tools.

Stars: ✭ 142 (+468%)

Mutual labels: text-analysis

text-analysis

Weaving analytical stories from text data

Stars: ✭ 12 (-52%)

Mutual labels: text-analysis

Padatious

A neural network intent parser

Stars: ✭ 124 (+396%)

Mutual labels: text-analysis

Awesome Text Classification

Awesome-Text-Classification Projects,Papers,Tutorial .

Stars: ✭ 158 (+532%)

Mutual labels: text-analysis

Fake news detection

Fake News Detection in Python

Stars: ✭ 194 (+676%)

Mutual labels: text-analysis

IncredibleTextAdventure

No description or website provided.

Stars: ✭ 19 (-24%)

Mutual labels: text-analysis

View All Similar Projects ➔

Rによる日本語のテキスト分析入門

Rによるテキスト分析を始める人のための、日本語によるquantedaの使用法の説明。メディア分析を事例とし、サンプルデータは朝日新聞の2016年の政治および外交に関する記事。本稿は、2017年に早稲田大学および神戸大学で開催したワークショップの資料が元になっている。量的テキスト分析の概要については、『日本語の量的テキスト分析』がある。

2020年5月17日に追記：より洗練された日本語の前処理に関しては、『経済学における量的テキスト分析入門』を参照すべし。

Quantedaについて

quantedaは、London School of Economics and Poltical Science（LSE）のKenneth Benoitが、2012年頃から欧州連合（EU）の支援を受けて開発を始めた社会科学向けの量的テキスト分析のRパッケージ。現在は、日本人やアジア人を含む、国際チームによってオープンソースで開発が進められ、欧米の政治科学者の間で人気を集めている。

なお、quantedaはquantitative analysis textual dataを短くしたものであり、カタカナでクオンティーダとも表記できる。

Quantedaの特徴

Rのパッケージとの互換性を重視し、既存の日本語向けのテキスト分析ツールなどよりも柔軟な統計分析を行うことができる。
テキストの内部処理はユニコードに準拠するため、あらゆる言語で利用でき、特に、日本語と中国語では形態素解析を用いずに文の分かち書きを行える。
プログラムの中核がC++で実装されているため、同等の処理をPythonで実装されたシステムの半分の実行時間とメモリー使用量で行うことができる。
外部のプログラムやライブラリーに依存しないためインストールが容易で、Windows、Mac OSおよびLinux上で動作する。

quantedaと他のRのテキスト分析パッケージの違いについてはJohns Hopkins UniversityのLen Greskiによる説明がわかりやすい。

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

koheiw / workshop-IJTA

Programming Languages

Labels

Projects that are alternatives of or similar to workshop-IJTA

Rによる日本語のテキスト分析入門

Quantedaについて

Quantedaの特徴

目次