Meet ReCo: An AI Extension for Diffusion Models to Enable Region Control

Text-to-Large Image Sample, Looking at stable distributorsHas covered machine learning areas in recent months. They show great generational performance in different settings and give us images we never thought possible before.

Template Create text to image Try to create realistic images using text boxes that describe what they should look like. For example, if you ask it to create “Homer Simpson Walking on the Moon” you might get a picture that looks nice with the right details. The huge success of the generation models in recent years is mainly due to the data set and the large-scale models used.

As they sound good, the diffusion model can still be considered as an early stage model because they lack some of the properties that should be addressed in the coming year.

Meet Hailo-8 ™: AI Processor Using Computer Vision for Multi-Camera Multi-Person Re-Identification (Sponsored)

First, the input of the query text determines the control of the resulting image. It is especially difficult to pinpoint exactly what you want in any position on the resulting image. If you want to draw a specific object in a specific position, such as a donut in the upper left corner, existing models can be difficult to do so.

Also Read :  This Week’s Awesome Tech Stories From Around the Web (Through December 3)

Second, when text entry questions are long and complicated, existing models overlook specific details and simply go with the information they learned in the training phase. When we combine these two issues, it becomes a problem to manage the image area created by the existing model.

Nowadays, when you want to get the image you want, you have to try a large number of interpretation questions and choose the result that is closest to the image you want the most. You may have heard of “Immediate Engineering” and this is the name of the process. It takes a lot of time and there is no guarantee that it will produce the image you want.

Also Read :  The Carol Exercise Bike Uses AI to Improve Your Workouts

So now we know we have a problem with the text-to-image model. But we are not here to talk about problems, are we? I would like to introduce you ReCOCustomizing text templates to images enables you to create precisely controlled output images.

The text-to-image model that manages the area is closely related to a layout problem to the image. These models take the box labeled as an input and create the desired image. However, despite their successful site management results, their limited tag dictionary makes it difficult for them to figure out how to enter text freely.

Instead of following a layout approach to images that model text and objects separately, ReCO combines these two input conditions and models them together. They call this approach “text-to-image, regionally controlled” issues. This way the two text input conditions and the area are seamlessly integrated.

Also Read :  Singapore’s 15 fastest-growing jobs for 2023, according to LinkedIn

ReCO is an extension of a text template to an existing image. It allows pre-trained models to understand the inputs of the coordinates of space. The core idea is to introduce an additional set of input position signals to indicate space location. These location markers are embedded in the image by dividing it into equal-sized areas. Each symbol can then be embedded in the nearest area.

ReCO location tokens provide an accurate specification of the open area description on any surface of the image, creating a useful new text input interface with site management.

Please check Paper. All credit for this research goes to researchers on this project. Do not forget to join us Our Reddit page. And Channels are inconsistent.Where we share the latest AI research information, cool AI projects and more.

Ekrem Çetinkaya received his B.Sc. In 2018 and M.Sc. In 2019 from Ozyegin University, Istanbul, Türkiye. He wrote M.Sc. Thesis on image rejection using deep convolutional networks. He is currently pursuing a PhD. Graduated from Klagenfurt University, Austria and worked as a researcher on the ATHENA project. His research interests include in-depth study of computer vision and multimedia networks.


Leave a Reply

Your email address will not be published.