Abstract: In image caption generation, embedding image captions as a feature into the model input has been proven to be an effective method. However, in the field of video captioning, the input ...