Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the new Multi-Modal model of mistral AI: pixtral-12b #3535

Open
SuperPat45 opened this issue Sep 12, 2024 · 2 comments
Open

Add the new Multi-Modal model of mistral AI: pixtral-12b #3535

SuperPat45 opened this issue Sep 12, 2024 · 2 comments
Labels
enhancement New feature or request roadmap

Comments

@SuperPat45
Copy link

Add the new Multi-Modal model of mistral AI: pixtral-12b:

https://huggingface.co/mistral-community/pixtral-12b-240910

It supports image encoder, can it also be added to the image generator API as an alternative to Stable Diffusion?

@SuperPat45 SuperPat45 added the enhancement New feature or request label Sep 12, 2024
@AlexM4H
Copy link

AlexM4H commented Sep 13, 2024

Since yesterday vllm has internVL2 support. :-)

vllm-project/vllm/releases/tag/v0.6.1

@mudler mudler added the roadmap label Sep 13, 2024
@mudler
Copy link
Owner

mudler commented Sep 13, 2024

I guess that would work already with llama.cpp GGUF models if/when is getting supported in there ( see also ggerganov/llama.cpp#9440 ).

I'd change the focus of this one to be more generic and add support for multimodal with vLLM, examples:

https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_pixtral.py
https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language_multi_image.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request roadmap
Projects
None yet
Development

No branches or pull requests

3 participants