Effective FNN in Transformer
๐งโ๐ป Tech |
AI |
LLM
Researching Optimized Transformer using Dimensionality Reduction
So, generally in a LLM, Feed Forward Network have more number of Parameters than Attention Block
So, researching like, before feeding the ouput from Multi Head Attention to Feed Forward Network, we can use some Dimensionality Reduction Operation like PCA, t-SNE to select only the necessary features and pass that to an Input to Feed Forward Network, which should be effective or before Attention would be helpful
Trying to do various variations and Researching on in
Thanks for reading!