All Projects → BlaCkinkGJ → catch-me-if-you-can

BlaCkinkGJ / catch-me-if-you-can

Licence: MIT License
plagiarism detector

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to catch-me-if-you-can

basic-lms-laravel
Basic Laravel Learning Management System
Stars: ✭ 54 (+237.5%)
Mutual labels:  plagiarism
minhash-lsh
Minhash LSH in Golang
Stars: ✭ 20 (+25%)
Mutual labels:  minhash
text-shingles
k-shingling for text to help compare similarity
Stars: ✭ 15 (-6.25%)
Mutual labels:  minhash
rkmh
Classify sequencing reads using MinHash.
Stars: ✭ 42 (+162.5%)
Mutual labels:  minhash
Sampled-MinHashing
A method to mine beyond-pairwise relationships using Min-Hashing for large-scale pattern discovery
Stars: ✭ 24 (+50%)
Mutual labels:  minhash
mkmh
Generate kmers/minimizers/hashes/MinHash signatures, including with multiple kmer sizes.
Stars: ✭ 21 (+31.25%)
Mutual labels:  minhash
dolos
🕵️ Source code plagiarism detection
Stars: ✭ 77 (+381.25%)
Mutual labels:  plagiarism-detection
JPlag
Detecting Software Plagiarism and Collusion since 1996.
Stars: ✭ 674 (+4112.5%)
Mutual labels:  plagiarism-detection
HyperMinHash-java
Union, intersection, and set cardinality in loglog space
Stars: ✭ 48 (+200%)
Mutual labels:  minhash
intertext
Detect and visualize text reuse
Stars: ✭ 97 (+506.25%)
Mutual labels:  minhash
Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Stars: ✭ 18 (+12.5%)
Mutual labels:  minhash
i-made-this
Have you ever wanted to develop a project, but do like, none of the work? Save time with this tool!
Stars: ✭ 28 (+75%)
Mutual labels:  plagiarism
PHP-Plagiarism-Checker
Copyleaks Plagiarism Checker - PHP SDK.
Stars: ✭ 26 (+62.5%)
Mutual labels:  plagiarism
Datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble
Stars: ✭ 1,635 (+10118.75%)
Mutual labels:  minhash
set-sketch-paper
SetSketch: Filling the Gap between MinHash and HyperLogLog
Stars: ✭ 23 (+43.75%)
Mutual labels:  minhash
bagminhash
BagMinHash - Minwise Hashing Algorithm for Weighted Sets
Stars: ✭ 24 (+50%)
Mutual labels:  minhash
Simple-Plagiarism-Checker
Web Application for checking the similarity between query and document using the concept of Cosine Similarity.
Stars: ✭ 47 (+193.75%)
Mutual labels:  plagiarism-detection
SA-Plag
Detect Plagiarism in competitive programmming
Stars: ✭ 13 (-18.75%)
Mutual labels:  plagiarism-detection

The reason why this program born

When I did the assistant of the lecture. Some students highly disagreed with his homework was plagiarism. And this made me so mad.

So, I think that if I make the program that collects the evidence of plagiarism and runs it then students will agree on his plagiarism. And I created this.

Dependencies

This program was made by python3. So, you must be installed python3. And you have to install below packages by using pip3

tqdm == 4.40.2
nltk == 3.4.5
datasketch == 1.5.0
matplotlib == 3.1.2
networkx == 2.4

Details

This program finds the plagiarism by using the MinHash algorithms.

Usage

You can use this program like below (also can see this document with ./plagiarism.py -h)

usage: plagiarism.py [-h] [-t <template file name>] [-o <output file name>]
                     [-p <working path>] [-r <remove regex pattern>]
                     [-s <summary file name>] [-g <graph weight0.0 ~ 1.0>]

optional arguments:
  -h, --help            show this help message and exit
  -t <template file name>, --template <template file name>
                        set template file
  -o <output file name>, --output <output file name>
                        set output file
  -p <working path>, --path <working path>
                        set compare files path
  -r <remove regex pattern>, --remove <remove regex pattern>
                        set remove patterns(regex) in file
  -s <summary file name>, --summary <summary file name>
                        set summary file
  -g <graph weight(0.0 ~ 1.0)>, --graph <graph weight(0.0 ~ 1.0)>
                        show associativity graph and set weight(0.0 ~ 1.0)

Terminology

  • template file: The template file refers to a file that is distributed in common. For instance, isn't there something always included when we create the hello world example? It serves to remove such content.
    • This value should be given as ~/dir1/dir2/template.c.
  • output file: This is the storage location of the CSV file with comparison full results.
    • This value should be given as ~/dir1/dir2/output.csv.
  • working path: This is the path that contains all the files you want to compare.
  • remove regex: Contains the pattern the user wants to delete. Typically, it is used to uncomment the source code.
    • This value should be given as ~/dir1/.
  • summary file: This is the storage location of the CSV file with comparison summary results.
    • This value should be given as ~/dir1/dir2/summary.csv.
  • graph weight: Corresponds to the threshold value determines the target to draw in the graph.

TODO

- [x] Support a feature of creating the graph
- [ ] Add the function which has the cosine similarity analyzes
- [ ] Add the feature of plagiarism detect which based on c functions
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].