Neural Style Transfer Basics

🧑‍💻 ML | Neural Network | CNN

Neural Style Transfer(NST)

It is a technique to apply the “filter” technically “Style” to your “original” technically “Content” image, it would come under an umbrella of GenAI(not that much)

Goal of NST is to preserve the Content of the Content Image and also to apply the Style of the Style Reference Image!

To understand NST better we need to understand what is CNN and what is learns?

Please learn here

↪️ Back to NST!

What each Layers in ConvNet is the main anchor we come back to NST for understanding!

In NST like Typical Neural Network, we won’t let learn the Parameters of the Neural Network by using Loss Function, but we will compute loss between Content, Style and Generated Image

😎 Cool right?

Generating Procedure

To generate and image with Content Image in Style of Style Image we need to have Cost Function to tell us each like,

How much different is Generated Image from Content Image?

How much different is Generated Image from Style Image?

Content Cost Function :

How different the Generated Image is from Content Image

$$J_{\text{content}}(C, G) = \frac{1}{2} \sum_{i,j} \left(F_{ij}^G - F_{ij}^C\right)^2$$

Style Loss for a Single Layer :

We say “Style” as a Correlation between Activations across channels

$$J_{style} = \frac{1}{4N_l^2M_l^2} \sum_{i,j} \left(G_{ij}^G - G_{ij}^S\right)^2$$

Correlation tells you which of the “High Level Texture” components tends to occur or not occur together in part of the image

So, Degree of Correlation gives us one way of measuring this

Across Channels we will find Correlation like if the image has Specific Texture how much degree Another Texture exists

We can assume ,

Specific Texture as "Vertical Lines", "Cross Lines", etc

Another Texture as "Cross Lines", etc

Correlated : Means whenever the part of the image has Specific Texture, that part of the image will probably have Another Texture

Uncorrelated : Means whenever the part of the image has Specific Texture, probably won’t have Another Texture

For that we need Gram Matrix to do that!

Gram Matrix :

$$G_{ij}^l = \sum_{k} F_{ik}^l F_{jk}^l$$

Total Loss :

Now we Compute the Total Loss and Optimize the Generated Image and get our desired Style Image

$$J_{\text{total}}(C, S, G) = \alpha J_{\text{content}}(C, G) + \beta J_{\text{style}}(S, G)$$

We get, Generated Image in the Content of Content Image and Style of Style Image

Thanks for reading