All Projects → AgentMaker → Paddle-CLIP

AgentMaker / Paddle-CLIP

Licence: Apache-2.0 license
A PaddlePaddle version implementation of CLIP of OpenAI.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Paddle-CLIP

clip-container
A containerized REST API around OpenAI's CLIP model.
Stars: ✭ 46 (-9.8%)
Mutual labels:  clip
natural-language-joint-query-search
Search photos on Unsplash based on OpenAI's CLIP model, support search with joint image+text queries and attention visualization.
Stars: ✭ 143 (+180.39%)
Mutual labels:  clip
Baidu Lane Segmentation
4th place solution in Baidu Autonomous Driving Lane Segmentation
Stars: ✭ 19 (-62.75%)
Mutual labels:  paddlepaddle
vue-pic-clip
一个简单的移动端裁剪图片上传插件
Stars: ✭ 30 (-41.18%)
Mutual labels:  clip
unity-clip-shader
Unity shader and scripts for rendering solid clipped geometry
Stars: ✭ 34 (-33.33%)
Mutual labels:  clip
Paddle-PerceptualSimilarity
LPIPS metric on PaddlePaddle. pip install paddle-lpips
Stars: ✭ 22 (-56.86%)
Mutual labels:  paddlepaddle
MoTIS
Mobile(iOS) Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP). Accepted at NAACL 2022.
Stars: ✭ 60 (+17.65%)
Mutual labels:  clip
PaddlePaddle-MTCNN
基于PaddlePaddle复现的MTCNN人脸检测模型
Stars: ✭ 23 (-54.9%)
Mutual labels:  paddlepaddle
Paddle-Image-Models
A PaddlePaddle version image model zoo.
Stars: ✭ 131 (+156.86%)
Mutual labels:  paddlepaddle
Transformer-MM-Explainability
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
Stars: ✭ 484 (+849.02%)
Mutual labels:  clip
PASSL
PASSL包含 SimCLR,MoCo v1/v2,BYOL,CLIP,PixPro,BEiT,MAE等图像自监督算法以及 Vision Transformer,DEiT,Swin Transformer,CvT,T2T-ViT,MLP-Mixer,XCiT,ConvNeXt,PVTv2 等基础视觉算法
Stars: ✭ 134 (+162.75%)
Mutual labels:  clip
twitch-downloader
Download Twitch VODs and Clips
Stars: ✭ 37 (-27.45%)
Mutual labels:  clip
clip-italian
CLIP (Contrastive Language–Image Pre-training) for Italian
Stars: ✭ 113 (+121.57%)
Mutual labels:  clip
pushdeer
开放源码的无App推送服务,iOS14+扫码即用。亦支持快应用/iOS和Mac客户端、Android客户端、自制设备
Stars: ✭ 2,911 (+5607.84%)
Mutual labels:  clip
PLSC
Paddle Large Scale Classification Tools,supports ArcFace, CosFace, PartialFC, Data Parallel + Model Parallel. Model includes ResNet, ViT, DeiT, FaceViT.
Stars: ✭ 113 (+121.57%)
Mutual labels:  paddlepaddle
photo-magician
🎨 provide some common image process apis with canvas
Stars: ✭ 12 (-76.47%)
Mutual labels:  clip
vqgan-clip-app
Local image generation using VQGAN-CLIP or CLIP guided diffusion
Stars: ✭ 94 (+84.31%)
Mutual labels:  clip
fauxClip
Clipboard support for Vim without +clipboard
Stars: ✭ 32 (-37.25%)
Mutual labels:  clip
videoclip
Easily create videoclips with mpv.
Stars: ✭ 49 (-3.92%)
Mutual labels:  clip
video features
Extract video features from raw videos using multiple GPUs. We support RAFT and PWC flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, ResNet features.
Stars: ✭ 225 (+341.18%)
Mutual labels:  clip

Paddle-CLIP

GitHub forks GitHub Repo stars GitHub release (latest by date including pre-releases) GitHub
A PaddlePaddle version implementation of CLIP of OpenAI. 【origin repo】

Install Package

  • Install by pip:
$ pip install paddleclip

Requirements

  • wget
  • ftfy
  • regex
  • paddlepaddle(cpu/gpu)>=2.0.1

Quick Start

import paddle
from PIL import Image
from clip import tokenize, load_model

# Load the model
model, transforms = load_model('ViT_B_32', pretrained=True)

# Prepare the inputs
image = transforms(Image.open("CLIP.png")).unsqueeze(0)
text = tokenize(["a diagram", "a dog", "a cat"])

# Calculate features and probability
with paddle.no_grad():
    logits_per_image, logits_per_text = model(image, text)
    probs = paddle.nn.functional.softmax(logits_per_image, axis=-1)
    
# Print the result
print(probs.numpy())
[[0.9927937  0.00421065 0.00299568]]

Zero-Shot Prediction

import paddle
from clip import tokenize, load_model
from paddle.vision.datasets import Cifar100

# Load the model
model, transforms = load_model('ViT_B_32', pretrained=True)

# Load the dataset
cifar100 = Cifar100(mode='test', backend='pil')
classes = [
    'apple', 'aquarium_fish', 'baby', 'bear', 'beaver', 'bed', 'bee', 'beetle', 'bicycle', 'bottle', 
    'bowl', 'boy', 'bridge', 'bus', 'butterfly', 'camel', 'can', 'castle', 'caterpillar', 'cattle', 
    'chair', 'chimpanzee', 'clock', 'cloud', 'cockroach', 'couch', 'crab', 'crocodile', 'cup', 'dinosaur', 
    'dolphin', 'elephant', 'flatfish', 'forest', 'fox', 'girl', 'hamster', 'house', 'kangaroo', 'keyboard', 
    'lamp', 'lawn_mower', 'leopard', 'lion', 'lizard', 'lobster', 'man', 'maple_tree', 'motorcycle', 'mountain', 
    'mouse', 'mushroom', 'oak_tree', 'orange', 'orchid', 'otter', 'palm_tree', 'pear', 'pickup_truck', 'pine_tree', 
    'plain', 'plate', 'poppy', 'porcupine', 'possum', 'rabbit', 'raccoon', 'ray', 'road', 'rocket', 
    'rose', 'sea', 'seal', 'shark', 'shrew', 'skunk', 'skyscraper', 'snail', 'snake', 'spider', 
    'squirrel', 'streetcar', 'sunflower', 'sweet_pepper', 'table', 'tank', 'telephone', 'television', 'tiger', 'tractor', 
    'train', 'trout', 'tulip', 'turtle', 'wardrobe', 'whale', 'willow_tree', 'wolf', 'woman', 'worm'
]

# Prepare the inputs
image, class_id = cifar100[3637]
image_input = transforms(image).unsqueeze(0)
text_inputs = tokenize(["a photo of a %s" % c for c in classes])

# Calculate features
with paddle.no_grad():
    image_features = model.encode_image(image_input)
    text_features = model.encode_text(text_inputs)

# Pick the top 5 most similar labels for the image
image_features /= image_features.norm(axis=-1, keepdim=True)
text_features /= text_features.norm(axis=-1, keepdim=True)
similarity = (100.0 * image_features @ text_features.t())
similarity = paddle.nn.functional.softmax(similarity, axis=-1)
values, indices = similarity[0].topk(5)

# Print the result
for value, index in zip(values, indices):
    print('%s: %.02f%%' % (classes[index], value*100.))
snake: 65.31%
turtle: 12.29%
sweet_pepper: 3.83%
lizard: 1.88%
crocodile: 1.75%

Linear-probe evaluation

import os
import paddle
import numpy as np
from tqdm import tqdm
from paddle.io import DataLoader
from clip import tokenize, load_model
from paddle.vision.datasets import Cifar100
from sklearn.linear_model import LogisticRegression

# Load the model
model, transforms = load_model('ViT_B_32', pretrained=True)

# Load the dataset
train = Cifar100(mode='train', transform=transforms, backend='pil')
test = Cifar100(mode='test', transform=transforms, backend='pil')

# Get features
def get_features(dataset):
    all_features = []
    all_labels = []
    
    with paddle.no_grad():
        for images, labels in tqdm(DataLoader(dataset, batch_size=100)):
            features = model.encode_image(images)
            all_features.append(features)
            all_labels.append(labels)

    return paddle.concat(all_features).numpy(), paddle.concat(all_labels).numpy()

# Calculate the image features
train_features, train_labels = get_features(train)
test_features, test_labels = get_features(test)

# Perform logistic regression
classifier = LogisticRegression(random_state=0, C=0.316, max_iter=1000, verbose=0)
classifier.fit(train_features, train_labels)

# Evaluate using the logistic regression classifier
predictions = classifier.predict(test_features)
accuracy = np.mean((test_labels == predictions).astype(np.float)) * 100.

# Print the result
print(f"Accuracy = {accuracy:.3f}")
Accuracy = 79.900

Pretrained Models Download

Contact us

Email : [email protected]
QQ Group : 1005109853

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].