Effective FNN in Transformer

🧑‍💻 Tech | AI | LLM

Researching Optimized Transformer using Dimensionality Reduction

So, generally in a LLM, Feed Forward Network have more number of Parameters than Attention Block

So, researching like, before feeding the ouput from Multi Head Attention to Feed Forward Network, we can use some Dimensionality Reduction Operation like PCA, t-SNE to select only the necessary features and pass that to an Input to Feed Forward Network, which should be effective or before Attention would be helpful

Trying to do various variations and Researching on in

Thanks for reading!