AbhilashaRavichander / PrivacyQA_EMNLP

Licence: MIT license

PrivacyQA, a resource to support question-answering over privacy policies.

Projects that are alternatives of or similar to PrivacyQA EMNLP

COCO-LM

[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

Stars: ✭ 109 (+354.17%)

Mutual labels: natural-language-understanding

Fill-the-GAP

[ACL-WS] 4th place solution to gendered pronoun resolution challenge on Kaggle

Stars: ✭ 13 (-45.83%)

Mutual labels: natural-language-understanding

viky-ai

Natural Language Processing platform. Allows to extract information from unstructured text.

Stars: ✭ 38 (+58.33%)

Mutual labels: natural-language-understanding

hyperdome

the safest place to reach out

Stars: ✭ 26 (+8.33%)

Mutual labels: privacy-enhancing-technologies

MHPC-Natural-Language-Processing-Lectures

This is the second part of the Deep Learning Course for the Master in High-Performance Computing (SISSA/ICTP).)

Stars: ✭ 33 (+37.5%)

Mutual labels: natural-language-understanding

privapi

Detect Sensitive REST API communication using Deep Neural Networks

Stars: ✭ 42 (+75%)

Mutual labels: privacy-enhancing-technologies

Catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.

Stars: ✭ 224 (+833.33%)

Mutual labels: natural-language-understanding

HElib

HElib is an open-source software library that implements homomorphic encryption. It supports the BGV scheme with bootstrapping and the Approximate Number CKKS scheme. HElib also includes optimizations for efficient homomorphic evaluation, focusing on effective use of ciphertext packing techniques and on the Gentry-Halevi-Smart optimizations.

Stars: ✭ 2,913 (+12037.5%)

Mutual labels: privacy-enhancing-technologies

OKD-Reading-List

Papers for Open Knowledge Discovery

Stars: ✭ 102 (+325%)

Mutual labels: natural-language-understanding

bert extension tf

BERT Extension in TensorFlow

Stars: ✭ 29 (+20.83%)

Mutual labels: natural-language-understanding

FUTURE

A private, free, open-source search engine built on a P2P network

Stars: ✭ 19 (-20.83%)

Mutual labels: natural-language-understanding

auto-gfqg

Automatic Gap-Fill Question Generation

Stars: ✭ 17 (-29.17%)

Mutual labels: natural-language-understanding

gpc-optmeowt

Browser extension for opting out from the sale and sharing of personal information per the California Consumer Privacy Act and other privacy laws

Stars: ✭ 75 (+212.5%)

Mutual labels: privacy-enhancing-technologies

conclave

Query compiler for secure multi-party computation.

Stars: ✭ 86 (+258.33%)

Mutual labels: privacy-enhancing-technologies

corpusexplorer2.0

Korpuslinguistik war noch nie so einfach...

Stars: ✭ 16 (-33.33%)

Mutual labels: natural-language-understanding

GLUE-bert4keras

基于bert4keras的GLUE基准代码

Stars: ✭ 59 (+145.83%)

Mutual labels: natural-language-understanding

mobiletrackers

A repository of telemetry domains and URLs used by mobile location tracking, user profiling, targeted marketing and aggressive ads libraries.

Stars: ✭ 118 (+391.67%)

Mutual labels: privacy-enhancing-technologies

shifting

A privacy-focused list of alternatives to mainstream services to help the competition.

Stars: ✭ 31 (+29.17%)

Mutual labels: privacy-enhancing-technologies

linguistics problems

Natural language processing in examples and games

Stars: ✭ 23 (-4.17%)

Mutual labels: natural-language-understanding

Luci

Logical Unity for Communicational Interactivity

Stars: ✭ 25 (+4.17%)

Mutual labels: natural-language-understanding

View All Similar Projects ➔

Question Answering for Privacy Policies

This repository contains the PrivacyQA dataset described in the EMNLP 2019 paper, Question Answering for Privacy Policies: Combining Computational and Legal Perspectives. PrivacyQA is a corpus consisting of 1750 questions about the contents of privacy policies, paired with expert annotations. The goal of this effort is to kickstart the development of question-answering methods for this domain, to address the (unrealistic) expectation that a large population should be reading many policies per day.

The data has been partitioned into a train and test set. The same split has been used in the experiments reported in the paper. You can also download all the relevant data from here.

The data is in a tab seperated format with the following fields:

Folder : Physical location of app data and metadata.
DocID : Unique identifier for privacy policy
QueryID : Unique identifier for query
SentID : Unique identifier for sentence
Split : Train or test split
Query : Text field consisting of crowdsourced question against policy
Segment : Sentence from privacy policy
Train- Label : {Relevant, Irrelevant} Test- Ann1, Ann2, Ann3, Ann4, Ann5 and Ann6: {Relevant, Irrelevant, None}

None: annotation should not be considered. Relevant: Segment is relevant for query. Irrelevant: segment is irrelevant for query. In addition, the test file contains a meta-annotation of if a segment was considered relevant by any annotator under 'Any_Relevant'.

Additionally, we also include annotations of each user query with applicable OPP-115 categories. The categories are sourced from the OPP-115 Corpus annotation scheme (Wilson et al., 2016), and the annotations for both train and test splits can be found in the meta-annotations folder. Each column corresponding to an OPP category contains a "1" if a category is considered relevant to the question as described in the paper, and "0" otherwise. A brief description of OPP-115 categories is as follows:

First Party Collection/Use: What, why and how information is collected by the service provider
Third Party Sharing/Collection: What, why and how information shared with or collected by third parties
Data Security: Protection measures for user information
Data Retention: How long user information will be stored
User Choice/Control: Control options available to users
User Access, Edit and Deletion: If/how users can access, edit or delete information
Policy Change: Informing users if policy information has been changed
International and Specific Audiences: Practices pertaining to a specific group of users
Other: General text, contact information or practices not covered by other categories.

If you make use of this dataset in your research, we ask that you please cite our paper:

@inproceedings{ravichander-etal-2019-question,
    title = "Question Answering for Privacy Policies: Combining Computational and Legal Perspectives",
    author = "Ravichander, Abhilasha  and
      Black, Alan W  and
      Wilson, Shomir  and
      Norton, Thomas  and
      Sadeh, Norman",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1500",
    doi = "10.18653/v1/D19-1500",
    pages = "4949--4959",
    abstract = "Privacy policies are long and complex documents that are difficult for users to read and understand. Yet, they have legal effects on how user data can be collected, managed and used. Ideally, we would like to empower users to inform themselves about the issues that matter to them, and enable them to selectively explore these issues. We present PrivacyQA, a corpus consisting of 1750 questions about the privacy policies of mobile applications, and over 3500 expert annotations of relevant answers. We observe that a strong neural baseline underperforms human performance by almost 0.3 F1 on PrivacyQA, suggesting considerable room for improvement for future systems. Further, we use this dataset to categorically identify challenges to question answerability, with domain-general implications for any question answering system. The PrivacyQA corpus offers a challenging corpus for question answering, with genuine real world utility.",
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

AbhilashaRavichander / PrivacyQA_EMNLP

Labels

Projects that are alternatives of or similar to PrivacyQA EMNLP

Question Answering for Privacy Policies