Caption Images

Use AI To Caption Images with an API

Назад к коллекциям

Модели в коллекции

Сортировка: по популярности (run_count)
salesforce/blip

Generate image captions

171 855 594 запусков
yorickvp/llava-13b

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities

33 944 168 запусков
andreasjansson/blip-2

Answers questions about images

31 389 210 запусков
lucataco/moondream2

moondream2 is a small vision language model designed to run efficiently on edge devices

8 365 782 запусков
pharmapsychotic/clip-interrogator

The CLIP Interrogator is a prompt engineering tool that combines OpenAI's CLIP and Salesforce's BLIP to optimize text prompts to match a given image. Use the resulting prompts with text-to-image models like Stable Diffusion to create cool art!

4 880 085 запусков
methexis-inc/img2prompt

Get an approximate text prompt, with style, matching an image. (Optimized for stable-diffusion (clip ViT-L/14))

2 660 006 запусков
daanelson/minigpt-4

A model which generates text in response to an input image and prompt.

1 846 646 запусков
rmokady/clip_prefix_caption

Simple image captioning model using CLIP and GPT-2

1 741 391 запусков
zsxkib/blip-3

Blip 3 / XGen-MM, Answers questions about images ({blip3,xgen-mm}-phi3-mini-base-r-v1)

1 335 139 запусков
zsxkib/molmo-7b

allenai/Molmo-7B-D-0924, Answers questions and caption about images

1 321 750 запусков
lucataco/sdxl-clip-interrogator

CLIP Interrogator for SDXL optimizes text prompts to match a given image

848 742 запусков
lucataco/qwen-vl-chat

A multimodal LLM-based AI assistant, which is trained with alignment techniques. Qwen-VL-Chat supports more flexible interaction, such as multi-round question answering, and creative capabilities.

825 746 запусков
lucataco/qwen2-vl-7b-instruct

Latest model in the Qwen family for chatting with video and image models

356 851 запусков
j-min/clip-caption-reward

Fine-grained Image Captioning with CLIP Reward

296 124 запусков
joehoover/instructblip-vicuna13b

An instruction-tuned multi-modal model based on BLIP-2 and Vicuna-13B

257 514 запусков
lucataco/florence-2-base

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

132 298 запусков
joehoover/mplug-owl

An instruction-tuned multimodal large language model that generates text based on user-provided prompts and images

55 800 запусков
lucataco/fuyu-8b

Fuyu-8B is a multi-modal text and image transformer trained by Adept AI

14 637 запусков
nohamoamary/image-captioning-with-visual-attention

datasets: Flickr8k

11 308 запусков
lucataco/smolvlm-instruct

SmolVLM-Instruct by HuggingFaceTB

8 266 запусков
lucataco/llama-3-vision-alpha

Projection module trained to add vision capabilties to Llama 3 using SigLIP

6 798 запусков
lucataco/ollama-llama3.2-vision-11b

Ollama Llama 3.2 Vision 11B

3 915 запусков
lucataco/ollama-llama3.2-vision-90b

Ollama Llama 3.2 Vision 90B

3 789 запусков
zsxkib/idefics3

Idefics3-8B-Llama3, Answers questions and caption about images

2 630 запусков
zsxkib/uform-gen

🖼️ Super fast 1.5B Image Captioning/VQA Multimodal LLM (Image-to-Text) 🖋️

2 348 запусков
fofr/deprecated-batch-image-captioning

A wrapper model for captioning multiple images using GPT, Claude or Gemini, useful for lora training

1 576 запусков