An introduction to Convolutional Neural Networks for image upscaling using Sub-pixel CNNs

Convolutional Neural Networks

The reason why CNNs are preferred when dealing with image processing is due to the amount of parameters they use. If we consider an input with dimensions of 1920x1080 we’d have over 2 million pixels with usually three color channels each, in most deep neural networks this would take an enormous amount of processing and the output wouldn’t be as good, since the data would be flattened in an one-dimensional array, which results in some loss of information from the original image.

How it works

CNN layers Ref:https://www.frontiersin.org/articles/10.3389/fpsyg.2017.01745/full

Convolutional Layer

Movement of the Kernel Ref:https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53

Convolutional neural networks can use an array of filters to highlight features from the input image which enable the categorization.

Application of the Kernel in the Convolutional layer Ref: https://towardsdatascience.com/how-convolutional-neural-network-works-cdb58d992363

The first layers capture smaller and simpler features, like edge detection. As you combine multiple layers and filters, the network can start detecting more complex features.

Each kernel detects a specific feature from the input image. A feature map is created by neurons using the same kernel and it changes during model teaching with a minimized loss function. These operations are passed to a ReLU function where the layers and feature maps merge and with backpropagation the values in the filter matrices are updated.

Pooling Layer

The Convolutional Layer and the Pooling Layer can be daisy-chained allowing for the capture of lower-level features in more complex images. The increase in the number of layers has drawbacks however, since the network will need a lot more computational power to be processed.

Fully Connected Layer

The Upscaling Problem

Ref:https://pmrpressrelease.com/wp-content/uploads/2018/12/4K-Display-Resolution.jpg

Image upscaling is a problem that many companies have to deal with, especially in a world where people need more space to store their files than ever before. If you think about all the pictures that are taken daily by smartphone users, and the demand for storage from those users, you can start picturing the importance of efficiently managing those files. An efficient way to deal with this is to store lower resolution images and upscaling them whenever the user needs to access that file, making it much lighter to store multiple sets of images on the database.

The upscaling process consists of taking a small low resolution image and turning it into a large high resolution file. This means that the new image will have much more pixels than the previous image, and you will need an algorithm to predict what the color of the new pixels in the high resolution image will be. There are many types of algorithms out there, and most of them use convolution networks to try and precisely predict the color of the new pixels that will be inserted into the upscaled image.

Grid Interpolation Ref: https://haifengl.github.io/images/grid-interpolation2d.png

Even though these algorithms may work in some cases, the image upscaling problem demands a lot more precision than what the traditional algorithms are able to deliver. Because of that, researchers started looking at Convolutional Neural Networks and deep learning to solve this problem in a much more precise way.

Ref: https://www.researchgate.net/publication/320104055_nImage_Super-Resolution_Using_a_Dilated_Convolutional_Neural_Network

When we compare the output from the bi-cubic interpolation and the output from the Convolutional Neural Network (CNN) algorithms, we can clearly see that the CNNs deliver a much higher precision than the bi-cubic approach.

Applying CNN to the Upscaling Problem

To start off, we will need to find and import a dataset of images to train our model. For this article, we will be using a small image dataset from Berkeley, 2011 (http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/BSR/BSR_bsds500.tgz) and the Keras API from TensorFlow

After importing the images, we will have to separate the training and the validation datasets and also define what image output size and upscaling factor will be used.

After importing the image dataset, we can scale the pixel color channels range between 0 and 1 instead of 0 to 255 and change the scale from RGB to YUV. This makes the training process easier and the final result will be better perceived by humans, since the model will take the y value of each image during training. After that we will separate the training and validation datasets and reduce the size of the images for training.

Here’s an example of a training and validation image after pre-processing. Note that for this example we are using an upscaling factor of 3.

Original Image in YUV and the Same Image with Reduced Resolution

Now that we have our images ready for training, we can write our model using Keras and start the training process. Our CNN will consist of 4 main layers, and a final conversion layer. We will be using “tanh” as our activation function and MSE as our loss function. The maximum number of epochs will be set 100, since the database is fairly small and we will choose ADAM as our optimization algorithm.

CNN layers:

  1. 64x5x5 layer used for feature identification;
  2. 32x3x3 layer used for finer feature identification;
  3. Another 32x3x3 layer used for finer feature identification;
  4. Sub-pixel convolution layer, where each new predicted pixel is placed accordingly to the output image’s size;
  5. The resulting matrix is converted into a 2 dimensional image matrix;
Network Layers Ref:https://arxiv.org/abs/1609.05158

After training our model, we will have to apply it to the testing images and convert the output images back into the RGB colorspace before analysing the results. In our code, we defined a function called upscale_image that will do just that for us.

Finally, the program will output each upscaled image, and we will be able to compare the results with the bicubic upscaling and the original ground truth image.

Original Image, Lower Resolution and Upscaled Image

The complete code as well as more details can be found on our GitHub repository: https://github.com/guipleite/Image-Upscaling-CNN

Final thoughts and conclusion

Even though this is a simple and efficient model, it does not produce the most detailed reconstructions that a CNN model could produce for image upscaling. For those types of applications that need a more precise and powerful model, there are other algorithms out there that might be worth considering (Note that these algorithms use other types of layers beyond just Convolution).

Here is a list of more complex neural networks worth considering, if you plan on implementing a Super resolution neural network for image upscaling:

--

--

Computer Engineering Student https://github.com/guipleite

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store