CATEGORY
MULTIMODAL
VISUAL UNDERSTANDING
Visual understanding of provided image
OCR
OCR recognizes text elements in the image
COMPARISON
Commonality
VISUAL UNDERSTANDING
YOUR INPUT
YOUR COMMAND
What is known about the structure in the upper part of the picture?
OCR
YOUR INPUT
YOUR COMMAND
COMPARISON
YOUR INPUT
YOUR COMMAND