기준:

코드가 있을것 (크게 어렵지 않을 것.)

paper with codes 기준 30 스타 이상

Pretrain을 직접 해야하는 논문은 최대한 지양(CNN - RNN 구조)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

https://paperswithcode.com/paper/show-attend-and-tell-neural-image-caption

Untitled

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

https://arxiv.org/abs/1511.07571

Untitled

Attention on Attention for Image Captioning

Untitled