All Projects → snguyenthanh → Better_profanity

snguyenthanh / Better_profanity

Licence: mit
Blazingly fast cleaning swear words (and their leetspeak) in strings

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Better profanity

Chat Censorship
Data related to investigation of chat client censorship
Stars: ✭ 317 (+419.67%)
Mutual labels:  censorship
Wireguard Manager
Self-hosted Wireguard Installer / Manager for CentOS, Debian, Ubuntu, Arch, Fedora, Redhat, Raspbian
Stars: ✭ 478 (+683.61%)
Mutual labels:  censorship
Goquiet Shadowsocks Docker
A Docker image for Shadowsocks over GoQuiet
Stars: ✭ 21 (-65.57%)
Mutual labels:  censorship
Badwords
A javascript filter for badwords
Stars: ✭ 336 (+450.82%)
Mutual labels:  words
Word forms
Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.
Stars: ✭ 463 (+659.02%)
Mutual labels:  words
Streisand
Streisand sets up a new server running your choice of WireGuard, OpenConnect, OpenSSH, OpenVPN, Shadowsocks, sslh, Stunnel, or a Tor bridge. It also generates custom instructions for all of these services. At the end of the run you are given an HTML file with instructions that can be shared with friends, family members, and fellow activists.
Stars: ✭ 22,605 (+36957.38%)
Mutual labels:  censorship
similar-english-words
Give me a word and I’ll give you an array of words that differ by a single letter.
Stars: ✭ 25 (-59.02%)
Mutual labels:  words
Filternet Php
A simple utility to check whether the given url/domain is blocked in Iran.
Stars: ✭ 41 (-32.79%)
Mutual labels:  censorship
Berty
Berty is a secure peer-to-peer messaging app that works with or without internet access, cellular data or trust in the network
Stars: ✭ 5,101 (+8262.3%)
Mutual labels:  censorship
Cloak
A censorship circumvention tool to evade detection against state adversaries
Stars: ✭ 942 (+1444.26%)
Mutual labels:  censorship
China Dictatorship
Chinese "Communist" "Dictatorship" "facts". 中国《共产主义》《独裁统治》的《事实》。Home to the mega-FAQ, news compilation, restaurant and music recommendations. 常见问答集,新闻集和饭店和音乐建议。Heil Xi 卐. 习万岁。
Stars: ✭ 337 (+452.46%)
Mutual labels:  censorship
Corpora
A collection of small corpuses of interesting data for the creation of bots and similar stuff.
Stars: ✭ 4,293 (+6937.7%)
Mutual labels:  words
Nudenet
Neural Nets for Nudity Detection and Censoring
Stars: ✭ 642 (+952.46%)
Mutual labels:  censorship
Gfwlist
The one and only one gfwlist here
Stars: ✭ 19,033 (+31101.64%)
Mutual labels:  censorship
Webolith
Aerolith 2.0 - Aerolith for the web. A word study site - study for Scrabble, Boggle, Words With Frentz, etc.
Stars: ✭ 28 (-54.1%)
Mutual labels:  words
go-pluralize
Pluralize and singularize any word (golang adaptation of https://www.npmjs.com/package/pluralize)
Stars: ✭ 60 (-1.64%)
Mutual labels:  words
Awesome Anti Censorship
curated list of open-source anti-censorship tools
Stars: ✭ 521 (+754.1%)
Mutual labels:  censorship
Lantern
Lantern官方版本下载 蓝灯 翻墙 代理 科学上网 外网 加速器 梯子 路由 lantern proxy vpn censorship-circumvention censorship gfw accelerator
Stars: ✭ 10,238 (+16683.61%)
Mutual labels:  censorship
Decliner
Decline russian words with Decliner
Stars: ✭ 36 (-40.98%)
Mutual labels:  words
Openvpn Install
Set up your own OpenVPN server on Debian, Ubuntu, Fedora, CentOS or Arch Linux.
Stars: ✭ 7,142 (+11608.2%)
Mutual labels:  censorship

better_profanity

Blazingly fast cleaning swear words (and their leetspeak) in strings

release Build Status python license

Currently there is a performance issue with the latest version (0.7.0). It is recommended to use the last stable version 0.6.1.

Inspired from package profanity of Ben Friedland, this library is significantly faster than the original one, by using string comparison instead of regex.

It supports modified spellings (such as p0rn, h4NDjob, handj0b and b*tCh).

Requirements

This package works with Python 3.4+ and PyPy3.

Installation

$ pip install better_profanity

Unicode characters

Only Unicode characters from categories Ll, Lu, Mc and Mn are added. More on Unicode categories can be found here.

Not all languages are supported yet, such as Chinese.

Usage

from better_profanity import profanity

if __name__ == "__main__":
    profanity.load_censor_words()

    text = "You p1ec3 of sHit."
    censored_text = profanity.censor(text)
    print(censored_text)
    # You **** of ****.

All modified spellings of words in profanity_wordlist.txt will be generated. For example, the word handjob would be loaded into:

'handjob', 'handj*b', 'handj0b', '[email protected]', '[email protected]', '[email protected]*b', '[email protected]', '[email protected]@b',
'h*ndjob', 'h*ndj*b', 'h*ndj0b', 'h*[email protected]', 'h4ndjob', 'h4ndj*b', 'h4ndj0b', '[email protected]'

The full mapping of the library can be found in profanity.py.

1. Censor swear words from a text

By default, profanity replaces each swear words with 4 asterisks ****.

from better_profanity import profanity

if __name__ == "__main__":
    text = "You p1ec3 of sHit."

    censored_text = profanity.censor(text)
    print(censored_text)
    # You **** of ****.

2. Censor doesn't care about word dividers

The function .censor() also hide words separated not just by an empty space but also other dividers, such as _, , and .. Except for @, $, *, ", '.

from better_profanity import profanity

if __name__ == "__main__":
    text = "...sh1t...hello_cat_fuck,,,,123"

    censored_text = profanity.censor(text)
    print(censored_text)
    # "...****...hello_cat_****,,,,123"

3. Censor swear words with custom character

4 instances of the character in second parameter in .censor() will be used to replace the swear words.

from better_profanity import profanity

if __name__ == "__main__":
    text = "You p1ec3 of sHit."

    censored_text = profanity.censor(text, '-')
    print(censored_text)
    # You ---- of ----.

4. Check if the string contains any swear words

Function .contains_profanity() return True if any words in the given string has a word existing in the wordlist.

from better_profanity import profanity

if __name__ == "__main__":
    dirty_text = "That l3sbi4n did a very good H4ndjob."

    profanity.contains_profanity(dirty_text)
    # True

5. Censor swear words with a custom wordlist

5.1. Wordlist as a List

Function load_censor_words takes a List of strings as censored words. The provided list will replace the default wordlist.

from better_profanity import profanity

if __name__ == "__main__":
    custom_badwords = ['happy', 'jolly', 'merry']
    profanity.load_censor_words(custom_badwords)

    print(profanity.contains_profanity("Have a merry day! :)"))
    # Have a **** day! :)

5.2. Wordlist as a file

Function `load_censor_words_from_file takes a filename, which is a text file and each word is separated by lines.

from better_profanity import profanity

if __name__ == "__main__":
    profanity.load_censor_words_from_file('/path/to/my/project/my_wordlist.txt')

6. Whitelist

Function load_censor_words and load_censor_words_from_file takes a keyword argument whitelist_words to ignore words in a wordlist.

It is best used when there are only a few words that you would like to ignore in the wordlist.

# Use the default wordlist
profanity.load_censor_words(whitelist_words=['happy', 'merry'])

# or with your custom words as a List
custom_badwords = ['happy', 'jolly', 'merry']
profanity.load_censor_words(custom_badwords, whitelist_words=['merry'])

# or with your custom words as a text file
profanity.load_censor_words_from_file('/path/to/my/project/my_wordlist.txt', whitelist_words=['merry'])

7. Add more censor words

from better_profanity import profanity

if __name__ == "__main__":
    custom_badwords = ['happy', 'jolly', 'merry']
    profanity.add_censor_words(custom_badwords)

    print(profanity.contains_profanity("Happy you, fuck!"))
    # **** you, ****!

Limitations

  1. As the library compares each word by characters, the censor could easily be bypassed by adding any character(s) to the word:
profanity.censor('I just have sexx')
# returns 'I just have sexx'

profanity.censor('jerkk off')
# returns 'jerkk off'
  1. Any word in wordlist that have non-space separators cannot be recognised, such as s & m, and therefore, it won't be filtered out. This problem was raised in #5.

Testing

$ python tests.py

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Special thanks to

Acknowledgments

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].