All Projects → PSLmodels → Pci China

PSLmodels / Pci China

Licence: agpl-3.0
Policy Change Index for China (PCI-China)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pci China

Shadowsocks Back China Pac
翻墙回国 Clash, PEPI, PAC 规则
Stars: ✭ 81 (-44.52%)
Mutual labels:  china
Swift Zhi
iOS ZhiHuDaily client, implemented in Swift
Stars: ✭ 103 (-29.45%)
Mutual labels:  china
Tinylist
Tiny version of gfwlist, focusing on common websites ONLY
Stars: ✭ 139 (-4.79%)
Mutual labels:  china
Adonis Adminify
Admin dashboard based on AdonisJs + Adminify (based on vuetify)
Stars: ✭ 90 (-38.36%)
Mutual labels:  china
Accountcore
AccountCoreForOdoo,in China
Stars: ✭ 97 (-33.56%)
Mutual labels:  china
Administrative Divisions Of China
中华人民共和国行政区划:省级(省份直辖市自治区)、 地级(城市)、 县级(区县)、 乡级(乡镇街道)、 村级(村委会居委会) ,中国省市区镇村二级三级四级五级联动地址数据。
Stars: ✭ 11,727 (+7932.19%)
Mutual labels:  china
World Cities
Multilingual list of countries, states & cities in XML format. 世界所有城市,国内所有省、市、区、县信息(2020.06)
Stars: ✭ 64 (-56.16%)
Mutual labels:  china
Crawler China Mainland Universities
中国大陆大学列表爬虫
Stars: ✭ 143 (-2.05%)
Mutual labels:  china
Punchandkick
A simple 2D Fighting Game.
Stars: ✭ 99 (-32.19%)
Mutual labels:  china
E200 opensource
This repository hosts the project for open-source hummingbird E203 RISC processor Core.
Stars: ✭ 1,909 (+1207.53%)
Mutual labels:  china
Gb T 2260
🇨🇳中华人民共和国国家标准 GB/T 2260 行政区划代码
Stars: ✭ 92 (-36.99%)
Mutual labels:  china
Gooderp addons
可能是中国用户数最多的开源ERP
Stars: ✭ 1,315 (+800.68%)
Mutual labels:  china
Distpicker
⚠️ [Deprecated] No longer maintained. A simple jQuery plugin for picking provinces, cities and districts of China. (中国 / 省市区 / 三级联动 / 地址选择器)
Stars: ✭ 1,608 (+1001.37%)
Mutual labels:  china
China Operator Ip
中国运营商IPv4/IPv6地址库-每日更新
Stars: ✭ 1,255 (+759.59%)
Mutual labels:  china
Vcards
📡️ vCards 中国黄页 - 优化 iOS/Android 来电、信息界面体验
Stars: ✭ 1,934 (+1224.66%)
Mutual labels:  china
Cn Massage Map
这里,真的介绍的是正规、优质按摩店(误:性感荷官,在线发牌)
Stars: ✭ 74 (-49.32%)
Mutual labels:  china
Xinahn Client
一个开源,高隐私,自架自用的聚合搜索引擎。https://xinahn.com
Stars: ✭ 116 (-20.55%)
Mutual labels:  china
Area Puppeteer
基于 puppeteer 的中国行政区域抓取爬虫
Stars: ✭ 144 (-1.37%)
Mutual labels:  china
Covid 19 Timeline
请关注端点星案和张展。// 以社会学年鉴模式体例规范地统编自2019年末起武汉新冠肺炎疫情进展的时间线(2019年12月1日-2020年4月24日)。感谢志愿者的辛劳操作。A sociology timeline (2019.12.1-2020.4.24) on how Wuhan Coronavirus break and spread, edited by anonymous volunteers.
Stars: ✭ 142 (-2.74%)
Mutual labels:  china
Lc Design
A UI component framework for building LCUI application.
Stars: ✭ 134 (-8.22%)
Mutual labels:  china

Website: policychangeindex.org

Build Status codecov

Authors: Julian TszKin Chan and Weifeng Zhong

Please email all comments/questions to julian.chan [AT] policychangeindex.org or weifeng.zhong [AT] policychangeindex.org

What is the Policy Change Index for China (PCI-China)?

China's industrialization process has long been a product of government direction, be it coercive central planning or ambitious industrial policy. For the first time in the literature, we develop a quantitative indicator of China's policy priorities over a long period of time, which we call the Policy Change Index for China (PCI-China). The PCI-China is a leading indicator that runs from 1951 to the most recent quarter and can be updated in the future. In other words, the PCI-China not only helps us understand the past of China's industrialization but also allows us to make short-term predictions about its future directions.

The design of the PCI-China has two building blocks: (1) it takes as input data the full text of the People's Daily --- the official newspaper of the Communist Party of China --- since it was founded in 1946; (2) it employs a set of machine learning techniques to "read" the articles and detect changes in the way the newspaper prioritizes policy issues.

The source of the PCI-China's predictive power rests on the fact that the People's Daily is at the nerve center of China's propaganda system and that propaganda changes often precede policy changes. Before the great transformation from the central planning under Mao to the economic reform program after Mao, for example, considerable efforts were made by the Chinese government to promote the idea of reform, move public opinion, and mobilize resources toward the new agenda. Therefore, by detecting (real-time) changes in propaganda, the PCI-China is, effectively, predicting (future) changes in policy.

For details about the methodology and findings of this project, please see the following research paper:

Disclaimer

Results will change as the underlying models improve. A fundamental reason for adopting open source methods in this project is so that people from all backgrounds can contribute to the models that our society uses to assess and predict changes in public policy; when community-contributed improvements are incorporated, the model will produce better results.

Getting Started

The first step for everyone (users and developers) is to open a free GitHub account. And then you can specify how you want to "watch" the PCI-China repository by clicking on the Watch button in the upper-right corner of the repository's main page.

The second step is to get familiar with the PCI-China repository by reading the documentation.

If you want to ask a question or report a bug, create a new issue here and post your question or tell us what you think is wrong with the repository.

If you want to request an enhancement, create a new issue here and provide details on what you think should be added to the repository.

Installation Guide

First, install the dependencies and set up the proper environment by running the following command in the shell:

./PCI-China>conda env create -f environment.yml

Second, activate the new environment pci_env:

./PCI-China>conda activate pci_env

Third, run the following in the pci_env environment:

./PCI-China>sh run_all.sh

The above command will perform the following tasks: (1) processing data, (2) training models for two-, five-, and ten-year rolling windows, (3) compiling results, (4) creating text output, and (5) visualizing results.

If you do not have the People's Daily data, you can run our tests which estimate a PCI using a simulated data set:

./PCI-China>pytest 

Notes

  • The default setting uses the first GPU to run the code. If you don't have a GPU, the code can be ran on CPU by changing the GPU setting to -1 (see details below)
  • One of the package imported by PCI (jieba-fast) requires Visual Studio C++ Build Tools. Please checkout jieba-fast's website for details.

Function Usage

The python and an R script listed below are contained in the run_all.sh file. They are available for users to perform the following tasks, respectively.

  • proc_pd.py: Process and prepare the raw data from the People's Daily for building the neural network models.
  • pci.py: Train a neural network model to construct the PCI-China for a specified year-quarter, using a specified rolling window length.
  • compile_tuning.py: Compile the results from all models and export them to a .csv file.
  • create_text_output.py: Generate the raw data together with the model's classification result for each article in a specified year-quarter.
  • gen_figures.R: Generate figures.
  • create_plotly.py: Create an interactive Plotly figure.

For the pci.py file, users can also check out the descriptions of the arguments for the function using the --help option:

./PCI-China>python pci.py --help
Using TensorFlow backend.
usage: pci.py [-h] [--model MODEL] [--year YEAR] [--month MONTH] [--gpu GPU]
              [--iterator ITERATOR] [--root ROOT] [--temperature TEMPERATURE]
              [--discount DISCOUNT] [--bandwidth BANDWIDTH]

optional arguments:
  -h, --help            show this help message and exit
  --model MODEL         Model name: window_5_years_quarterly,
                        window_10_years_quarterly, window_2_years_quarterly
  --year YEAR           Target year
  --month MONTH         Target month
  --gpu GPU             Which gpu to use
  --iterator ITERATOR   Iterator in simulated annealing
  --root ROOT           Root directory
  --temperature TEMPERATURE
                        Temperature in simulated annealing
  --discount DISCOUNT   Discount factor in simulated annealing
  --bandwidth BANDWIDTH
                        Bandwidth in simulated annealing

Data

The raw data of the People's Daily, which are not provided in this repository, should be placed in the sub-folder PCI-China/Input/pd/. Each file in this sub-folder should contain one year-quarter of data, be named by the respective year-quarter, and be in the .pkl format. For example, the raw data for the first quarter of 2018 should be in the file 2018_Q1.pkl. Below is the list of column names and types of each raw data file:

>>> df1 = pd.read_pickle("./PCI-China/Input/pd/pd_1946_1975.pkl")
>>> df1.dtypes
date     datetime64[ns]
year              int64
month             int64
day               int64
page              int64
title            object
body             object
id                int64
dtype: object

where title and body are the Chinese texts of the title and body of each article.

The processed data of the People's Daily, which are not provided in this repository, should be placed in the sub-folder PCI-China/data/Output/database.db. The file is in SQLite format. The schema of the database is shown as the table below:

import sqlite3
import pandas as pd 

conn = sqlite3.connect("data/output/database.db")
pd.read_sql_query("PRAGMA TABLE_INFO(main)", conn)
cid name type notnull dflt_value pk
0 0 date TIMESTAMP 0 None 0
1 1 id INTEGER 0 None 0
2 2 page REAL 0 None 0
3 3 title TEXT 0 None 0
4 4 body TEXT 0 None 0
5 5 strata INTEGER 0 None 0
6 6 title_seg TEXT 0 None 0
7 7 body_seg TEXT 0 None 0
8 8 year INTEGER 0 None 0
9 9 quarter INTEGER 0 None 0
10 10 month INTEGER 0 None 0
11 11 day INTEGER 0 None 0
12 12 weekday INTEGER 0 None 0
13 13 frontpage INTEGER 0 None 0
14 14 page1to3 INTEGER 0 None 0
15 15 title_len INTEGER 0 None 0
16 16 body_len INTEGER 0 None 0
17 17 n_articles_that_day INTEGER 0 None 0
18 18 n_pages_that_day REAL 0 None 0
19 19 n_frontpage_articles_that_day INTEGER 0 None 0

where title_int and body_int are the word embeddings (numeric vectors) of the title and body of each article.

The summary statistics for the processed data can be found in the following .csv file:

https://github.com/PSLmodels/PCI-China/blob/master/PCI-China/figures/Summary%20statistics.csv

Neither the raw data nor the processed data of the People's Daily can be released by the authors. Users who have questions about applying the repository to their own data are welcome to contact the authors:

Citing the PCI-China

Please cite the source of the latest PCI-China by the website: https://policychangeindex.org.

For academic work, please cite the following research paper:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].