Home Tech Hub Alibaba unveiled the new open source AI that creates images with the...

Alibaba unveiled the new open source AI that creates images with the right lesson

0

Alibaba unveiled the new open source AI that creates images with the right lesson

Alibaba has launched an open-source AI image generator Quven-Emegs that excels in presenting the lesson correctly. The model Apache 2.0 supports complex layouts, bilingual materials and commercial use under the license.

Listen to the story

Advertisement
Alibaba unveiled the new open source AI that creates images with the right lesson
Representative image

In short

  • Qwen-image presents complex, multilingual text accurately in images
  • Open sources and free for commercial and non-commercial use
  • Sugar and layout-specific text reduces rivals in rendering

Alibaba has released a new open-source image generation model, called covane-images, which separates themselves by presenting the complex and multilingual text within images, by presenting themselves separately, a task where many other AI devices still struggle. Developed by Alibaba’s Qwen team, Qwen-Image is designed to handle everything from handwritten poetry and bilingual poster to e-commerce product labels and class diagrams while maintaining high quality, readable text. The model supports both alphabetic scripts, such as English and Loggographic, like sugar, it makes it particularly useful in multilingual contexts.

Advertisement

Users can try Qwen-Image through the Qwen chat website by switching to the “image generation” mode. The model has also been issued under the Apache 2.0 license, which means that business and developers can use it, modify and distribute it – even for commercial purposes – as long as they include proper attention.

Qwen-Image’s training data includes the image-text of the billions obtained from natural scenes, human paintings, artistic posters and artificially generated text data. Interestingly, all the synthetic data used for training was produced by Alibaba in in-house, and no AI-borne pictures from other models were included. This approach helped the model learn to handle rare or complex characters, especially in Chinese.

The model was trained in steps, which began with images with simple caption and gradually went into more complex layouts and dense multilingual texts. According to Alibaba, the training of this course-style helped to make Qwen-image better in various forms.

Under the hood, Quven-image connects three main components:

-Qwen2.5-Vl, a multimodal language model to understand reference

-A VA Encoder/Decoder, customized for high-resolution layout

-MMDIT, a proliferation model with a special encoding system for spatial alignment

These elements work together to produce images that are not only visually attractive, but also accurate in terms of text placements and draft.

Alibaba claims that covane-image has been tested against several industry benchmarks for lesson clarity, layout precision and early capacity. The AI Arena on the Public Leaderboard, which uses human evaluation to rank the AI image model, the covane-image allegedly finishs the third position and is the highest ranked open-source model.

– Ends

NO COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Exit mobile version