Content GANeration

Super resolution  

Remember the typical scene in a crime series when they have a blurry image of a suspect and ask their technology expert to “zoom in and enhance”?

Although those scenes are nowhere technically accurate, there exist some techniques that take low-resolution images as input and upscale them to higher resolution ones. Super-resolution is one of them and for a long time the idea was thought to be science fiction as the “data processing inequality theorem” states that the post-processing of data cannot add any information that was not already there. However, with the advent of neural networks and GAN’s, you can add information that was learned by training these networks on large amounts of examples thus allowing for actual reconstruction of faces for example.

Super-resolution has a lot of interesting real-world applications that are only just starting to be explored, such as reducing the file sizes of images and videos, as a preprocessing step for various AI applications such as for example deepfakes and as a post-processing step in various industries such as in the medical field, cosmology or simply for enhancing your favorite old movies and pictures.

Figure 1. Example of upscaling a blurry image.

Although the idea is not new, the field of super-resolution has revived with the advent of GAN’s and made significant improvements in only a couple of years. Moreover, a big advantage of this particular field is that an unlimited amount of data is available, since you can easily downscale high resolution images and use these pairs as training data. There are also lots of publicly available datasets such as for example https://data.vision.ee.ethz.ch/cvl/DIV2K/.

Challenges

While there is growing interest, super resolution still faces major challenges in developing effective algorithms. These challenges are summarized below:

  • There is still an ongoing debate whether it’s best to go for general super resolution or object-specific super resolution where the first step is to detect certain object in an image such as for example faces and then try to upscale those objects with specific architectures, where you try to upscale the whole image, no matter the objects present in it. An example of the former is CAR [1]. An example of the latter is DFDNet [2] which is a state of the art face upscaling technique.
  • As with GAN evaluation in general, it’s hard to measure and compare the quality of different super resolution techniques. While several measures have been introduced, as of yet, there is no consensus as to which measure best captures strengths and limitations of models and should be used for fair model comparison.

Example topics

Background super resolution for portrait images

A recently published paper called DFDNet [1] achieved state of the art results on the upscaling of human faces. However, it only can scale up the faces itself, but keeps the surroundings as is. In this thesis, you would investigate the possibility of also upscaling the background, as a separate network or incorporated in the DFDNet architecture. This would open the door for video upscaling, as now there are clearly visible artifacts when only upscaling someone's face next to the background staying blurred.

Goal

Research and create a Machine Learning algorithm that can upscale the resolution of an image and fill in the details realistically. Technologies that can be used are Python, Tensorflow, Keras and in general the Python data science and machine learning track.

Image colorization

Always wondered how the old photo album of your family heritage would look like in color? Interested in bringing the past more to life? Then this might be a subject for you.

Image colorization is the process of trying to convert a grayscale image to a colored one, while filling in the colors as realistically as possible. The idea is not new, people have been hand-coloring photos since decades and also some computer-aided, reference based techniques popped up in the early 2000’s. However, there has been tremendous progress in the last 5 years through the use of diverse deep-learning architectures ranging from the early brute-force networks [3] to more recent custom-designed Generative Adversarial Networks [4]. 

Figure 2. Image colorization example.

Challenges

While there is growing interest, image colorization still faces major challenges in developing effective algorithms. These challenges are summarized below:

  • As with GAN evaluation in general, it’s hard to measure and compare the quality of different image colorization techniques. The goal is not to recreate the colors of the original image, since this is nearly impossible based on only the grayscale values (see figure 3), but the goal is to colorize the image as realistically as possible, based on the objects and textures present in the image. This makes evaluation a non-trivial task.

Figure 3. Comparison between Color Image and Gray Image. [5]

  • Only a handful of research papers have been published on image colorization. For example DeOldify [6], one of the state of the art image colorization techniques has open source code available but no paper. Although this brings an extra challenge to the table, it also brings the opportunity to make a valuable contribution to an emerging research field.

Example topics

Video colorization

What if these colorization techniques could be applied to videos? The research around image colorization has almost exclusively been centered around images, and currently video colorization is mostly just the application of image colorization to the individual frames of the video. There are a lot of possibilities to improve the state of the art for video colorization by for example taking the temporal component into account when coloring in the frames or trying to fix some of the challenges that are specific to old videos such as mitigating the flickering effect.

Goal

Research and create a Machine Learning algorithm that can colorize videos realistically, improving on the current state of the art of colorizing individual frames by taking temporal components into account. Technologies that can be used are Python, Tensorflow, Keras and in general the Python data science and machine learning track.

Garment transfer

Ever wonder how you would look in a certain t-shirt or pair of shoes without having to try it on? Well, that’s the problem that garment transfer is trying to solve. Given an image of a person and piece of clothing as input, the goal is to get a photo-realistic picture of that person wearing that piece of clothing. 

Garment transfer existed as science fiction for a long time, but only recently became possible to solve with the advent of GAN’s. Since, it has already evolved into a popular subtopic for research and seen a lot of progress, as can be seen on the figure below.

Figure 4. Garment transfer example. [7]

Garment transfer comes in a variety of flavours with slight variations on the inputs (e.g. from a single image of the clothing that should be transferred, to a collection of images, to an image of another person wearing the clothes that should be transferred), but in general the problem can be divided into 2 subproblems. First the algorithm should learn to separate a person’s body (pose, shape, skin color) from their clothing. Secondly, it should generate new images of the person wearing a new clothing item. The outputs also come in different forms and range from generating a single image, to generating a full 3D clothing transfer [8] where images of different viewpoints and poses can be generated.

Challenges

While there is growing interest, garment transfer still faces major challenges in developing effective algorithms. These challenges are summarized below:

  • Hard to obtain suitable datasets for both training and evaluation. As of today, there are no easy ways to obtain a dataset for the training of garment transfer models. An ideal dataset would contain multiple images of a particular clothing item from different viewpoints, pictures of different people wearing those clothing items and pictures of the same people wearing different clothing items. 
  • There is a large difference between the transfer or clothing with complex patterns and simple clothing such as for example a plain white t-shirt. With the transfer of complex patterns it’s hard to both match the target person's body shape and keep the styling intact.
  • The diversity of problem statements for garment transfer make it hard to see the forest for the trees. Because of the slight variations in both input and output, it’s hard to unify the advancements in the field and obtain a clear view of the current state of the art.

Example topics

Garment transfer survey paper

Because garment transfer research is still in its infancy and due to the lack of consensus on how to approach the problem, it can be hard to see the forest for the trees. Summarizing and organizing the different approaches and their advances along with an analysis and comparison of their advantages and drawbacks can add a lot of value to the field. Lowering the threshold for new researchers to enter the field and helping current researchers make connections between current approaches.

Goal

Research, analyse and summarize the current state of art for garment transfer techniques. 

Conditional GAN’s

In a well trained GAN, the generator part of the network is able to generate new, photo-realistic examples of the type of images that the network was trained on. However, it’s hard to control what kind of image you want the GAN to generate, other than a random image that comes from the same distribution as the training set. 

Let’s take for example the StyleGAN architecture from NVIDIA [9] that is behind the well-known website thispersondoesnotexist.com that generates photo-realistic faces of people that don’t exist. 

Figure 5. Examples from thispersondoesnotexist.com.

Once fully trained, it’s easy to ask StyleGAN to generate a new realistic looking face, but there is no way to ask it to generate for example an image of a middle-aged asian man with long hair, except to keep generating images until you get a face with the desired properties.

This problem significantly reduces the usability of GAN’s in real-world applications. 

There have already been various approaches to solve this problem with the most popular being conditional GAN’s and controllable generation. Conditional GAN’s are GAN’s that receive additional input during the training phase, which is the label of which class the image belongs to. Controllable generation happens after training and consists of adjusting the latent feature vector in an attempt to control the features of the output image.

Challenges

While there is growing interest, conditional GAN’s still faces major challenges in developing effective algorithms. These challenges are summarized below:

  • When training conditional GAN’s with labels, you first need to have a labeled dataset, which is a major drawback for a technique that could otherwise be fully unsupervised. Furthermore, it isn’t clear how to handle continuous features or how to control multiple features at a time efficiently.
  • With controllable generation, it’s challenging to find a direction in which only a single feature of the input image (e.g. hair color) is affected. Most times, the features in the Z-space are entangled with each other which makes it difficult to have granular control over the features of the image that the generator generates.

Example topics

Z-space feature disentanglement

With controllable generation, you try to tweak the latent feature vector of the generator in a way that the output changes in the desired direction. However, when different features have a high correlation in the data set that was used to train your GAN, it becomes difficult to control specific features without modifying the ones that are correlated to them. For example, if you want to add a beard to the picture of a woman this will likely also change other facial features like the nose and jawline in a way that it looks more masculin. This is not desirable if you only want to edit a single feature. Furthermore, this also applies to features that aren’t correlated in the training set since without special attention, the Z-space is learned to become entangled.

Goal

Research and create a GAN that has a disentangled Z-space in a particular subdomain such as medical imaging. The goal is to be able to influence single, relevant features of medical images such as for example the size of a tumor. Technologies that can be used are Python, Tensorflow, Keras and in general the Python data science and machine learning track.