Caption Images

Use AI To Caption Images with an API

Назад к коллекциям

Модели в коллекции

Сортировка: по популярности (run_count)

salesforce/blip

Generate image captions

171 855 594 запусков

yorickvp/llava-13b

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities

33 944 168 запусков

andreasjansson/blip-2

Answers questions about images

31 389 210 запусков

lucataco/moondream2

moondream2 is a small vision language model designed to run efficiently on edge devices

8 365 782 запусков

pharmapsychotic/clip-interrogator

The CLIP Interrogator is a prompt engineering tool that combines OpenAI's CLIP and Salesforce's BLIP to optimize text prompts to match a given image. Use the resulting prompts with text-to-image models like Stable Diffusion to create cool art!

4 880 085 запусков

methexis-inc/img2prompt

Get an approximate text prompt, with style, matching an image. (Optimized for stable-diffusion (clip ViT-L/14))

2 660 006 запусков

daanelson/minigpt-4

A model which generates text in response to an input image and prompt.

1 846 646 запусков

rmokady/clip_prefix_caption

Simple image captioning model using CLIP and GPT-2

1 741 391 запусков

zsxkib/blip-3

Blip 3 / XGen-MM, Answers questions about images ({blip3,xgen-mm}-phi3-mini-base-r-v1)

1 335 139 запусков

zsxkib/molmo-7b

allenai/Molmo-7B-D-0924, Answers questions and caption about images

1 321 750 запусков

lucataco/sdxl-clip-interrogator

CLIP Interrogator for SDXL optimizes text prompts to match a given image

848 742 запусков

lucataco/qwen-vl-chat

A multimodal LLM-based AI assistant, which is trained with alignment techniques. Qwen-VL-Chat supports more flexible interaction, such as multi-round question answering, and creative capabilities.

825 746 запусков

lucataco/qwen2-vl-7b-instruct

Latest model in the Qwen family for chatting with video and image models

356 851 запусков

j-min/clip-caption-reward

Fine-grained Image Captioning with CLIP Reward

296 124 запусков

joehoover/instructblip-vicuna13b

An instruction-tuned multi-modal model based on BLIP-2 and Vicuna-13B

257 514 запусков

lucataco/florence-2-base

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks