🔧 Filters
Try: author name · keyword phrase · DOI
Articles
The intersection of computer vision and natural language processing has led to the rapid development of vision-language models capable of performing complex multimodal tasks such as image captioning,
Hybrid ModelsVision-Language TasksCNN-TransformerImage CaptioningVQADeep Learning