Tuning Stylegan Locally to Faithfully Encode Any Input Image
The ability to distinguish between synthetic images and real ones has become increasingly challenging since the introduction of Generative Adversarial Networks (GAN). Although the images produced by this framework are often indistinguishable from real ones, one lacks the ability to control the specific outcome. Most relevant to our work is the StyleGAN family of generators, which can produce, for example, realistic faces based on random input vectors.
UNMET NEED
StyleGAN2 was demonstrated to be a powerful image generation engine that supports semantic editing. However, to manipulate a real-world image, one first needs to be able to retrieve its corresponding latent representation in StyleGAN’s latent space that is decoded to an image as close as possible to the desired image. For many real-world images, a latent representation does not exist, which necessitates the tuning of the generator network.
OUR SOLUTION
Our solution enables a per-image optimization method that tunes a StyleGAN2 generator such that it achieves a local edit to the generator’s weights, resulting in almost perfect inversion, while still allowing image editing, by keeping the rest of the mapping between an input latent representation tensor and an output image relatively intact. The method is based on a one-shot training of a set of shallow update networks (aka. Gradient Modification Modules) that modify the layers of the generator. After training the Gradient Modification Modules, a modified generator is obtained by a single application of these networks to the original parameters, and the previous editing capabilities of the generator are maintained. Our experiments show a sizable gap in performance over the current state of the art in this very active domain.
As can be seen in Fig. 1 below, our method is able to generate almost identical reconstructions. The first and third rows demonstrate the reconstruction of difficult examples. Our method is able to produce near-identical reconstruction, Gradient Adjusting Networks for Domain Inversion whereas all other methods struggle to achieve inversion of good quality. The second row demonstrates the reconstruction of a relatively easy example. Although all methods are able to produce meaningful reconstruction, only our method is truly able to preserve identity and properly reconstruct fine details (such as gaze, eye color, dimples, etc.).
APPLICATIONS
• Remove shut eyes from photos, reducing the need to retake the photo.
• Add smiles to photos.
• Add variety to different commersionals, for instance, automatically change the color of the object in question.
• Virtual plastic surgery
• Aging, cartooning
INTELLECTUAL PROPERTY
Provisional patent application