Nvidia shrinks AI image generation method to size of a WhatsApp message

Nvidia researchers have developed a new AI image generation technique that could allow highly customized text-to-image models with a fraction of the storage requirements.

According to a paper published on arXiv, the proposed method called “Perfusion” enables adding new visual concepts to an existing model using only 100KB of parameters per concept.

screenshot-2023-08-02-103336

As the paper’s authors describe, Perfusion works by “making small updates to the internal representations of a text-to-image model.”

More specifically, it makes carefully calculated changes to the parts of the model that connect the text descriptions to the generated visual features. Applying minor, parameterized edits to the cross-attention layers allows Perfusion to modify how text inputs get translated into images.

Therefore, Perfusion doesn’t totally retrain a text-to-image model from scratch. Instead, it slightly adjusts the mathematical transformations that turn words into pictures. This allows it to customize the model to produce new visual concepts without needing as much compute power or model retraining.

The Perfusion method needs only 100kb.

Perfusion achieved these results with two to five orders of magnitude fewer parameters than competing techniques.

While other methods may require hundreds of megabytes to gigabytes of storage per concept, Perfusion needs only 100KB – comparable to a small image, text, or WhatsApp message.

This dramatic reduction could make deploying highly customized AI art models more feasible.

According to co-author Gal Chechik,

“Perfusion not only leads to more accurate personalization at a fraction of the model size, but it also enables the use of more complex prompts and the combination of individually-learned concepts at inference time.”

The method allowed creative image generation, like a “teddy bear sailing in a teapot,” using personalized concepts of “teddy bear” and “teapot” learned separately.

screenshot-2023-08-02-102624

Possibilities of Efficient Personalization

Perfusion’s unique capability to enable the personalization of AI models using just 100KB per concept opens up a myriad of potential applications:

This method paves the way for individuals to easily tailor text-to-image models with new objects, scenes, or styles, eliminating the need for expensive retraining. The efficiency of Perfusion’s 100KB parameter update per concept allows models that are customized with this technique to be implemented on consumer devices, enabling on-device image creation.

One of the most striking aspects of this technique is the potential it offers for sharing and collaboration around AI models. Users could share their personalized concepts as small add-on files, circumventing the need to share cumbersome model checkpoints.

In terms of distribution, models that are tailored to particular organizations could be more easily disseminated or deployed at the edge. As the practice of text-to-image generation continues to become more mainstream, the ability to achieve such significant size reductions without sacrificing functionality will be paramount.

It’s important to note, however, that Perfusion primarily provides model personalization rather than full generative capability itself.

Limitations and Release

While promising, the technique does have some limitations. The authors note that critical choices during training can sometimes over-generalize a concept. More research is still needed to seamlessly combine multiple personalized ideas within a single image.

The authors note that code for Perfusion will be made available on their project page, indicating an intention to release the method publicly in the future, likely pending peer review and an official research publication. However, specifics on public availability remain unclear since the work is currently only published on arXiv. On this platform, researchers can upload papers before formal peer review and publication in journals/conferences.

While Perfusion’s code is not yet accessible, the authors’ stated plan implies that this efficient, personalized AI system could find its way into the hands of developers, industries, and creators in due course.

As AI art platforms like MidJourney, DALL-E 2, and Stable Diffusion gain steam, techniques that allow greater user control could prove critical for real-world deployment. With clever efficiency improvements like Perfusion, Nvidia appears determined to retain its edge in a rapidly evolving landscape.

Source: Cryptoslate

Bittensor suffers $8 million exploit, TAO price tumbles to six-month low

What the Bat? gets absurdly cinematic in free “Battywood” update

The new PC VR game Archery Red challenges your marksmanship in Superhot VR style

Robinhood Snaps Up AI Firm Pluto to Boost Retail Trading Smarts

AI Featured Posts

Open Source AI Coalition Plans New Model to Compete With Stable Diffusion

ChatGPT was down for more than 90 minutes after a major OpenAI API outage

Circle Expands USDC Deployment to Layer 2 Networks OP Mainnet and Base

Scarlett Johannson takes legal action against AI app that cloned her likeness

Metaverse Featured Posts

Meta Connect 2023: What to expect

One of VR’s biggest combat games nears completion

Popular VR hero shooter X8 goes free to play on Steam

Win Arizona Sunshine 2 for PSVR 2: MIXED Advent Calendar – Door 7

NFTs Featured Posts

‘Heroes of Mavia’ Delays Ethereum Token Unlocks as Price Falls 65% Since February

Bitcoin Metaverse Token Pre-Sale From ‘Life Beyond’ Team Raises $3.5M

Artgence and Boson Protocol Introduce NFT Art Sale at Metaverse Art Week in Decentraland

Security Alert: OneMintNFT’s Discord Server Has Been Compromised

Let's Get Social

Nvidia shrinks AI image generation method to size of a WhatsApp message

The Perfusion method needs only 100kb.

Possibilities of Efficient Personalization

Limitations and Release

Billion-dollar crypto VC says DeFi-style tokens and fun could boost gaming

Investments in AI Sector Expected to Hit $200 Billion Globally by 2025: Goldman Sachs

Leave a Reply Cancel reply

Bittensor suffers $8 million exploit, TAO price tumbles to six-month low

What the Bat? gets absurdly cinematic in free “Battywood” update

The new PC VR game Archery Red challenges your marksmanship in Superhot VR style

Robinhood Snaps Up AI Firm Pluto to Boost Retail Trading Smarts

Voices of Burt Reynolds, Judy Garland Recreated With AI

AI Featured Posts

Metaverse Featured Posts

NFTs Featured Posts

Let's Get Social

Nvidia shrinks AI image generation method to size of a WhatsApp message

The Perfusion method needs only 100kb.

Possibilities of Efficient Personalization

Limitations and Release

Share this article

Billion-dollar crypto VC says DeFi-style tokens and fun could boost gaming

Investments in AI Sector Expected to Hit $200 Billion Globally by 2025: Goldman Sachs

Leave a Reply Cancel reply

Read next