Search: "VQA" — Emerging Science Research

Search Emerging Science Research

Try: author name · keyword phrase · DOI

A Benchmark Study of Hybrid CNN-Transformer Architectures in Vision-Language Tasks

Articles

The intersection of computer vision and natural language processing has led to the rapid development of vision-language models capable of performing complex multimodal tasks such as image captioning,

Hybrid ModelsVision-Language TasksCNN-TransformerImage CaptioningVQADeep Learning

🗓 Jun 2025pp. 36-49View Article →