1 of 40 open-source models (filtered).
Vision-language variant of Yi 34B. Image-text reasoning via an MLP adapter on a CLIP encoder. Useful for bilingual EN/中 multimodal workloads where the major Western vision-language models underperform on Chinese text in images.