Vision models

Chat with images for understanding, captioning & detection via API

Назад к коллекциям

Модели в коллекции

Сортировка: по популярности (run_count)
yorickvp/llava-13b

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities

33 944 168 запусков
openai/gpt-4o-mini

Low latency, low cost version of OpenAI's GPT-4o model

27 637 318 запусков
lucataco/moondream2

moondream2 is a small vision language model designed to run efficiently on edge devices

8 365 782 запусков
yorickvp/llava-v1.6-mistral-7b

LLaVA v1.6: Large Language and Vision Assistant (Mistral-7B)

4 974 846 запусков
anthropic/claude-3.7-sonnet

The most intelligent Claude model and the first hybrid reasoning model on the market (claude-3-7-sonnet-20250219)

3 767 553 запусков
yorickvp/llava-v1.6-vicuna-13b

LLaVA v1.6: Large Language and Vision Assistant (Vicuna-13B)

3 752 193 запусков
anthropic/claude-4-sonnet

Claude Sonnet 4 is a significant upgrade to 3.7, delivering superior coding and reasoning while responding more precisely to your instructions

2 232 855 запусков
daanelson/minigpt-4

A model which generates text in response to an input image and prompt.

1 846 646 запусков
yorickvp/llava-v1.6-34b

LLaVA v1.6: Large Language and Vision Assistant (Nous-Hermes-2-34B)

1 770 103 запусков
google/gemini-2.5-flash

Google’s hybrid “thinking” AI model optimized for speed and cost-efficiency

1 679 794 запусков
openai/gpt-4.1-mini

Fast, affordable version of GPT-4.1

1 556 600 запусков
cjwbw/cogvlm

powerful open-source visual language model

1 498 846 запусков
lucataco/qwen-vl-chat

A multimodal LLM-based AI assistant, which is trained with alignment techniques. Qwen-VL-Chat supports more flexible interaction, such as multi-round question answering, and creative capabilities.

825 746 запусков
anthropic/claude-3.5-sonnet

Anthropic's most intelligent language model to date, with a 200K token context window and image understanding (claude-3-5-sonnet-20241022)

627 718 запусков
openai/gpt-4o

OpenAI's high-intelligence chat model

506 750 запусков
lucataco/qwen2-vl-7b-instruct

Latest model in the Qwen family for chatting with video and image models

356 851 запусков
cjwbw/internlm-xcomposer

Advanced text-image comprehension and composition based on InternLM

164 428 запусков
joehoover/mplug-owl

An instruction-tuned multimodal large language model that generates text based on user-provided prompts and images

55 800 запусков
lucataco/bakllava

BakLLaVA-1 is a Mistral 7B base augmented with the LLaVA 1.5 architecture

39 824 запусков
lucataco/qwen2.5-omni-7b

Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner.

31 610 запусков
adirik/owlvit-base-patch32

Zero-shot / open vocabulary object detection

24 965 запусков
adirik/kosmos-g

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

4 515 запусков
lucataco/ollama-llama3.2-vision-11b

Ollama Llama 3.2 Vision 11B

3 915 запусков
lucataco/ollama-llama3.2-vision-90b

Ollama Llama 3.2 Vision 90B

3 789 запусков
zsxkib/uform-gen

🖼️ Super fast 1.5B Image Captioning/VQA Multimodal LLM (Image-to-Text) 🖋️

2 348 запусков