Vision models

A multimodal LLM-based AI assistant, which is trained with alignment techniques. Qwen-VL-Chat supports more flexible interaction, such as multi-round question answering, and creative capabilities.

825 746 запусков

anthropic/claude-3.5-sonnet

Anthropic's most intelligent language model to date, with a 200K token context window and image understanding (claude-3-5-sonnet-20241022)

627 718 запусков

openai/gpt-4o

OpenAI's high-intelligence chat model

506 750 запусков

lucataco/qwen2-vl-7b-instruct

Latest model in the Qwen family for chatting with video and image models

356 851 запусков

cjwbw/internlm-xcomposer

Advanced text-image comprehension and composition based on InternLM

164 428 запусков

joehoover/mplug-owl

An instruction-tuned multimodal large language model that generates text based on user-provided prompts and images

55 800 запусков

lucataco/bakllava

BakLLaVA-1 is a Mistral 7B base augmented with the LLaVA 1.5 architecture

39 824 запусков

lucataco/qwen2.5-omni-7b

Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner.

31 610 запусков

adirik/owlvit-base-patch32

Zero-shot / open vocabulary object detection

24 965 запусков

adirik/kosmos-g

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

4 515 запусков

lucataco/ollama-llama3.2-vision-11b

Ollama Llama 3.2 Vision 11B

3 915 запусков

lucataco/ollama-llama3.2-vision-90b

Ollama Llama 3.2 Vision 90B

3 789 запусков

zsxkib/uform-gen

🖼️ Super fast 1.5B Image Captioning/VQA Multimodal LLM (Image-to-Text) 🖋️

2 348 запусков