All Projects → danielpontello → cnn-captcha-solving

danielpontello / cnn-captcha-solving

Licence: other
CAPTCHA solving using Convolutional Neural Networks

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to cnn-captcha-solving

captcha-solver
Library and CLI for automating captcha verification across multiple providers.
Stars: ✭ 101 (+134.88%)
Mutual labels:  captcha-solving
TikTokBot
Bot save videos from instagram and then post them to Tik-Tok
Stars: ✭ 21 (-51.16%)
Mutual labels:  captcha-solving
mybabe
MyBB CAPTCHA Solver using Convolutional Neural Network in Keras
Stars: ✭ 18 (-58.14%)
Mutual labels:  captcha-solving
Awesome Web Scraping
List of libraries, tools and APIs for web scraping and data processing.
Stars: ✭ 4,510 (+10388.37%)
Mutual labels:  captcha-solving
Buster
Captcha solver extension for humans
Stars: ✭ 4,244 (+9769.77%)
Mutual labels:  captcha-solving
2captcha-php
PHP package for easy integration with the API of 2captcha captcha solving service to bypass recaptcha, hcaptcha, funcaptcha, geetest and solve any other captchas.
Stars: ✭ 25 (-41.86%)
Mutual labels:  captcha-solving
2captcha-python
Python 3 package for easy integration with the API of 2captcha captcha solving service to bypass recaptcha, hcaptcha, funcaptcha, geetest and solve any other captchas.
Stars: ✭ 140 (+225.58%)
Mutual labels:  captcha-solving
2captcha-go
Golang Module for easy integration with the API of 2captcha captcha solving service to bypass recaptcha, hcaptcha, funcaptcha, geetest and solve any other captchas.
Stars: ✭ 31 (-27.91%)
Mutual labels:  captcha-solving

cnn-captcha-solving

This repository contains our final work for the Computer Engineering graduation at the National Telecommunications Institute located in Santa Rita do Sapucaí, Brazil, titled "Evidencing CAPTCHA Vulnerabilites using Convolutional Neural Networks".

Block diagram

Abstract

This document aims to describe the development process of a Convolutional Neural Network that seeks to assess the reliability of CAPTCHAs, security mechanisms present in several websites. It is presented a theoretical revision about the technologies used and related scientific works. We also present the experiments, metrics and details of the construction and operation of the Neural Network. In the end, we present the work's results.

Directory Structure

This project is comprised of the following directories:

  • dataset-generator: Contains code for the generation of the artificial dataset used on the training of the neural network.
  • neural-network: Contains code used for training and testing of the neural network.
  • experiments: Miscellaneous files scripts used on the project.
  • results: Results collected from the experiment.

Running

To generate an artificial dataset for the training of the neural network, on the dataset-generator directory run the following command:

$ python fies-generate.py <number_of_samples>

Where <number_of_samples> is the number of sample CAPTCHAs to be generated. This command can take a long time to run, depending on the number of samples being generated. The images will be saved on the dataset/raw folder.

After generating the dataset, the images must be filtered and segmented to be used to train the neural network. To do that, run the following command:

$ python fies-filter.py

The segmented images will be saved on subdirectories of the dataset/segmented directory.

With our dataset ready, we can start training the network. To do that, on the neural-network folder, run the command:

$ python train-network.py

WARNING: This step can consume large amounts of RAM (about ~8GB for 72000 segmented images). Close any unnecessary programs before running.

You can uncomment the following lines to enable hardware acceleration on OpenCL-enabled devices (like AMD graphics cards). This can greatly speed up the training process:

import plaidml.keras
plaidml.keras.install_backend()

Various parameters of the network can be changed by editing this script, as shown below:

num_samples = 2000          # number of samples to use on training
epochs = 1024               # number of epochs of training
learning_rate = 1e-3        # learning rate of the network
batch_size = 128            # batch size
validation_split = 0.66     # the train/validation split percentage to be used
min_delta = 1e-6            # minimum improvement of the validation accuracy before stopping training
patience = 10               # number of epochs without improvement before stopping training

The trained model will be saved to the models folder.

Used Libraries:

The following libraries were used on this project:

  • Numpy: Scientific computing package for Python
  • OpenCV: Computer Vision library
  • Keras: Machine learning library that runs atop Tensorflow
  • TensorFlow: High-performance machine learning library
  • Pillow: Image creation and manipulation library.
  • PlaidML: Keras backend, used for enabling GPU acceleration on OpenCL-enabled devices
  • Matplotlib: Chart plotting library
  • Memory Profiler: Memory Profiler for Python

Authors

Advisor

Students

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].