The overall architecture of our proposed approach → retrieve-then-augment

sequence encoding 을 위해 사용되는 transformer 구조에서 t번째 레이어로부터 나온 sequence의 embedding들을 섞는 것
이렇게 함으로써 다음과 같은 효과를 노리려고 함
we operate on more abstractive sequence representations (instead of item embeddings as in original mixup)
model can further adjust the model parameters according to the incorporated external information
interpolate the hidden states at an intermediate layer for enriching the sequential semantics.
기존 어떤 다른 연구에서 : decoding from an interpolation of two hidden vectors → generate a new sentences with mixed meaning of two original sentences.