Scaled Dot-Product Attention

Scaled dot-product attention is a type of attention mechanism that is commonly used in transformer-based models. It allows the model to calculate the importance of each word in the input sequence by calculating the dot product of the corresponding word vectors.