All Projects → JanPalasek → ulozto-captcha-breaker

JanPalasek / ulozto-captcha-breaker

Licence: MIT license
Deep learning model using Tensorflow that breaks ulozto captcha codes.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to ulozto-captcha-breaker

TikTokBot
Bot save videos from instagram and then post them to Tik-Tok
Stars: ✭ 21 (-67.69%)
Mutual labels:  captcha, captcha-breaking, captcha-solver
2captcha-go
Golang Module for easy integration with the API of 2captcha captcha solving service to bypass recaptcha, hcaptcha, funcaptcha, geetest and solve any other captchas.
Stars: ✭ 31 (-52.31%)
Mutual labels:  captcha, captcha-breaking
Buster
Captcha solver extension for humans
Stars: ✭ 4,244 (+6429.23%)
Mutual labels:  captcha, captcha-solver
Awesome Web Scraping
List of libraries, tools and APIs for web scraping and data processing.
Stars: ✭ 4,510 (+6838.46%)
Mutual labels:  captcha-breaking, captcha-solver
captcha-breaking-library
Neural network, contour analysis, bitmap vector subtraction CAPTCHA solving library and scripting language with perceptive color space segmentation
Stars: ✭ 76 (+16.92%)
Mutual labels:  captcha, captcha-breaking
Z-Spider
一些爬虫开发的技巧和案例
Stars: ✭ 33 (-49.23%)
Mutual labels:  captcha, captcha-solver
2captcha-python
Python 3 package for easy integration with the API of 2captcha captcha solving service to bypass recaptcha, hcaptcha, funcaptcha, geetest and solve any other captchas.
Stars: ✭ 140 (+115.38%)
Mutual labels:  captcha, captcha-breaking
2captcha-php
PHP package for easy integration with the API of 2captcha captcha solving service to bypass recaptcha, hcaptcha, funcaptcha, geetest and solve any other captchas.
Stars: ✭ 25 (-61.54%)
Mutual labels:  captcha, captcha-breaking
Captcha break
验证码识别
Stars: ✭ 2,268 (+3389.23%)
Mutual labels:  captcha, captcha-breaking
Hei.captcha
一个跨平台的图形验证码生成工具包/.net core
Stars: ✭ 172 (+164.62%)
Mutual labels:  captcha
Rnn ctc
Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.
Stars: ✭ 220 (+238.46%)
Mutual labels:  captcha
Gocaptcha
A captcha library written in golang
Stars: ✭ 154 (+136.92%)
Mutual labels:  captcha
Antiddos System
🛡️⚔️ Protect your web app from DDOS attack or the Dead Ping + CAPTCHA VERIFICATION in one line!
Stars: ✭ 173 (+166.15%)
Mutual labels:  captcha
Tlg joincaptchabot
Telegram Bot to verify if users that join a group, are humans. The Bot send an image captcha for each new user, and kick any of them that can't solve the captcha in a specified time.
Stars: ✭ 226 (+247.69%)
Mutual labels:  captcha
Captcha
Captcha for Laravel 5/6/7/8
Stars: ✭ 1,985 (+2953.85%)
Mutual labels:  captcha
rotate-captcha
Rotate image captcha,旋转图片验证码
Stars: ✭ 50 (-23.08%)
Mutual labels:  captcha
Decryptr
An extensible API for breaking captchas
Stars: ✭ 154 (+136.92%)
Mutual labels:  captcha
Captcha
Captcha image generator server in Go
Stars: ✭ 152 (+133.85%)
Mutual labels:  captcha
cs-wordpress-bouncer
CrowdSec is an open-source cyber security tool. This plugin blocks detected attackers or display them a captcha to check they are not bots.
Stars: ✭ 25 (-61.54%)
Mutual labels:  captcha
12306 Captcha
基于深度学习的12306验证码识别
Stars: ✭ 254 (+290.77%)
Mutual labels:  captcha

ulozto-captcha-breaker

Deep learning model using Tensorflow that breaks ulozto captcha codes.

examples

Algorithm used will be described in a standalone document.

How to use pretrained model in your project

Prerequisities

Packages

  • numpy~=1.18.3
  • tflite_runtime~=2.5.0

You need to install Tensorflow Lite Runtime with the correct version depending on your operating system and instruction set. It can be found here: https://www.tensorflow.org/lite/guide/python.

Model specification

  • Input shape: (batch_size, height, width, 1), where height = 70, width = 175
  • Output shape: (batch_size, number_of_letters, number_of_classes), where number_of_letters = 4 and number_of_classes = 26

Note that it takes grayscale images as the input. RGB images therefore have to be converted.

Steps

  1. Go to latest release and download binary files

  2. Instantiate the tflite interpreter. For that you're going to need TFLite model. You can find it in the release binary files.

    • PATH_TO_TFLITE_MODEL is path to directory containing the neural network pretrained model
    import tflite_runtime.interpreter as tflite
    interpreter = tflite.Interpreter(model_path=PATH_TO_TFLITE_MODEL)
  3. Normalize image to 0..1 interval. If it already is, skip this step.

    img = (img / 255).astype(np.float32)
  4. Predict using following code

    # convert to grayscale
    r, g, b = img[:, :, 0], img[:, :, 1], img[:, :, 2]
    input = 0.299 * r + 0.587 * g + 0.114 * b
    
    # input has nowof  shape (70, 175)
    # we modify dimensions to match model's input
    input = np.expand_dims(input, 0)
    input = np.expand_dims(input, -1)
    # input is now of shape (batch_size, 70, 175, 1)
    # output will have shape (batch_size, 4, 26)
    
    interpreter.allocate_tensors()
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    interpreter.set_tensor(input_details[0]['index'], input)
    interpreter.invoke()
    
    # predict and get the output
    output = interpreter.get_tensor(output_details[0]['index'])
    # now get labels
    labels_indices = np.argmax(output, axis=2)
    
    available_chars = "abcdefghijklmnopqrstuvwxyz"
    
    def decode(li):
        result = []
        for char in li:
            result.append(available_chars[char])
        return "".join(result)
    
    decoded_label = [decode(x) for x in labels_indices][0]
    • np for numpy

How to train your own model

  1. Install environment Following script creates new virtual environment. You can of course use global environment instead. All following section's scripts are expected to be executed from repository's root directory.

    git clone https://github.com/JanPalasek/ulozto-captcha-breaker
    cd "ulozto-captcha-breaker"
    
    # create virtual environment
    python -m venv "venv"
    
    source venv/bin/activate # or .\venv\Scripts\activate.ps1 in windows Powershell
    python -m pip install --upgrade pip
    python -m pip install --upgrade wheel setuptools pip-tools
    python -m piptools sync
    python -m pip install -e .
  2. Obtain dataset of captcha images and store it to directory out/data. Images are expected to be named according to captcha displayed in the image.

    E.g.

    captcha image

    This captcha image is expected to be named e.g. ABFD.png, abfd.png (if we don't care about case sensitivity) or e.g. ABFD_{UUID4 CODE}.png (to distinguish different images for same captcha letters).

    This project contains a way to generate captchas yourself using captcha Python package using script bin/simple_captcha_generate.py. You can run it in a following manner

    python bin/simple_captcha_generate.py --height=70 --width=175 --available_chars="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" --captcha_length=6 --dataset_size=10000

    Some of notable parameters are:

    • available_chars - list of characters that will be generated
    • captcha_length - how long generated captcha is going to be
    • dataset_size - how large dataset is going to be generated
    • height - height of generated captcha
    • width - width of generated captcha
  3. Generate annotations files using bin/captcha_annotate.py script. You can call it for example

    python bin/captcha_annotate.py --val_split=0.1 --test_split=0.1 --case_sensitive

    This will shuffle and split data into train/validation/test according to following parameters:

    • val_split - how large part of data is going to be used for validation, e.g. 0.1 means 10%
    • test_split - how large part of data is going to be used for testing
    • case_sensitive - switch denoting that labels that are created will be case sensitive
      • if such parameter is not passed, then for example if aBcD is in the image (and image is named accordingly), resulting label will be abcd
      • if it is passed, resulting label will be aBcD

    This script will create annotations.txt, annotations-train.txt, annotations-validation.txt and annotations-test.txt.

  4. Run training script bin/train.py for example like this:

    python bin/train.py --available_chars="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" --captcha_length=6 

    Training script notably logs models after each checkpoint into logs/train.py-{START TIMESTAMP}-{parameters etc.} directory.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].