Run vision language models locally with MLX-VLM, a tool for startup founders, with 4.8k+ GitHub stars.
4,840 stars554 forksPythonQuality 8/10Updated 6/1/2026100% free ยท open source
What it does
MLX-VLM allows startup founders to run and fine-tune powerful vision-language models directly on their Macs, enabling the development of multimodal AI applications
When to use it
โขYou're building an AI application that requires both image and text understanding
โขYou need to fine-tune a pre-trained vision-language model for your specific use case
โขYou prefer to develop and test AI models on your Mac rather than relying on cloud services
Quick start
1Install MLX-VLM using pip: `pip install mlx-vlm`
2Clone the MLX-VLM GitHub repository to access example code and models: `git clone https://github.com/Blaizzy/mlx-vlm`
3Import the library and load a pre-trained model: `from mlx_vlm import VLM; model = VLM.from_pretrained('model_name')`
4Use the model for inference or fine-tuning: `model.predict(image, text)`
Ready-to-paste prompt
from mlx_vlm import VLM; model = VLM.from_pretrained('vlc-bert-base'); model.predict(image='path/to/image.jpg', text='This is a picture of a cat')
Saves to your device
Topics
apple-silicon
florence2
idefics
llava
llm
local-ai
mlx
molmo
paligemma
pixtral
vision-framework
vision-language-model
vision-transformer
What's inside โ free to inspect
No purchase needed
Read the entire source before you build โ unlike paid marketplaces that hide it behind a buy button.