The following leaderboard presents the evaluation results of various Multimodal Large Language Models (MLLMs) across different data modalities in the MapTab path planning task. Performance is measured using three key metrics: Exact Match Accuracy (EMA), Partial Match Accuracy (PMA), and Difficulty-aware Score (DS). Models were tested using varying combinations of map, edge, and vertex data, as detailed below:
·Map-only: Only map data used
·Edge-only: Only edge data used
·Map+Edge: Map and edge data combined
·Map+Edge+Vertex: Map, edge, and vertex data combined
·Map+Vertex2: Map and merged vertex data (Vertex2_tab)
For clarity, comparisons involving Edge_tab + Vertex_tab were omitted, as they yielded similar results to the Map-only and Edge_tab-only groups without adding new insights. The best performing results in both open-source and closed-source model groups are highlighted in bold.
| Model | Type | Map-only | Edge-only | Map+Edge | Map+Edge+Vertex | Map+Vertex2 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| EMA | PMA | DS | EMA | PMA | DS | EMA | PMA | DS | EMA | PMA | DS | EMA | PMA | DS | ||
| Scenario: MetroMap | ||||||||||||||||
| Open-source Models | ||||||||||||||||
| Qwen3-VL-8B-Instruct | Instruct | 2.75 | 17.58 | 67 | 25.69 | 46.44 | 1018 | 21.25 | 41.30 | 785 | 19.31 | 39.31 | 728 | 4.69 | 21.87 | 137 |
| Qwen3-VL-8B-Thinking | Thinking | 5.12 | 20.99 | 132 | 31.69 | 49.76 | 1276 | 38.00 | 57.06 | 1669 | 23.75 | 41.69 | 948 | 6.38 | 22.93 | 194 |
| Qwen3-VL-2B-Instruct | Instruct | 0.94 | 15.14 | 26 | 9.88 | 27.61 | 371 | 6.63 | 23.85 | 232 | 7.00 | 26.91 | 289 | 2.00 | 17.82 | 58 |
| Qwen2.5-VL-7B-Instruct | Instruct | 0.94 | 15.02 | 21 | 14.00 | 31.20 | 535 | 11.69 | 28.32 | 441 | 7.94 | 20.77 | 318 | 3.38 | 18.09 | 101 |
| Phi-3.5-Vision-Instruct-4B | Instruct | 0.06 | 10.40 | 1 | 10.87 | 27.92 | 402 | 6.63 | 22.14 | 208 | 2.75 | 12.27 | 99 | 0.81 | 12.94 | 13 |
| Phi-4-Multimodal-Instruct-6B | Instruct | 0.00 | 9.75 | 0 | 2.13 | 12.52 | 66 | 2.13 | 11.78 | 85 | 1.75 | 9.51 | 52 | 0.44 | 9.02 | 7 |
| InternVL3-8B-Instruct | Instruct | 0.13 | 13.98 | 2 | 10.50 | 29.57 | 414 | 12.81 | 31.83 | 488 | 9.00 | 24.73 | 377 | 1.75 | 17.00 | 68 |
| Qwen3-VL-30B-A3B-Instruct | Instruct | 3.31 | 19.26 | 102 | 23.69 | 44.33 | 961 | 22.56 | 43.58 | 914 | 19.00 | 40.03 | 724 | 6.75 | 26.22 | 218 |
| Qwen3-VL-32B-Instruct | Instruct | 6.31 | 22.23 | 181 | 31.87 | 54.45 | 1270 | 32.12 | 54.54 | 1339 | 28.50 | 50.06 | 1181 | 6.56 | 24.43 | 187 |
| Qwen3-VL-32B-Thinking | Thinking | 13.31 | 29.43 | 437 | 31.81 | 54.94 | 1276 | 44.12 | 62.77 | 2078 | 26.56 | 51.48 | 1060 | 9.19 | 28.89 | 278 |
| Closed-source Models | ||||||||||||||||
| GPT-4o | Instruct | 6.63 | 25.61 | 205 | 42.38 | 64.07 | 2112 | 40.69 | 62.40 | 1944 | 35.63 | 55.51 | 1630 | 11.31 | 31.11 | 398 |
| GPT-4.1 | Instruct | 7.94 | 25.52 | 235 | 48.56 | 67.07 | 2523 | 46.81 | 65.18 | 2413 | 41.81 | 62.88 | 2038 | 14.06 | 35.98 | 515 |
| Gemini-3-Flash-Preview | Instruct | 37.06 | 57.15 | 2046 | 74.75 | 84.99 | 5345 | 73.06 | 83.37 | 5171 | 69.19 | 76.14 | 4765 | 53.87 | 65.84 | 3294 |
| Doubao-Seed-1-6-251015-w/o | No-Thinking | 8.13 | 24.60 | 233 | 46.94 | 66.98 | 2394 | 48.06 | 66.95 | 2533 | 40.56 | 62.11 | 2088 | 13.81 | 35.61 | 494 |
| Doubao-Seed-1-6-251015-Thinking | Thinking | 12.06 | 30.49 | 461 | 74.38 | 86.23 | 4996 | 74.00 | 85.68 | 4964 | 76.06 | 83.41 | 5029 | 22.03 | 42.48 | 984 |
| Qwen-VL-Plus-w/o | No-Thinking | 4.81 | 21.83 | 133 | 36.88 | 58.69 | 1643 | 38.25 | 58.59 | 1706 | 31.62 | 52.92 | 1355 | 6.94 | 27.69 | 229 |
| Qwen-VL-Plus-Thinking | Thinking | 10.75 | 29.11 | 349 | 61.50 | 76.62 | 3576 | 62.19 | 76.42 | 3648 | 45.75 | 64.46 | 2318 | 16.38 | 37.44 | 582 |
| Scenario: TravelMap | ||||||||||||||||
| Open-source Models | ||||||||||||||||
| Qwen3-VL-8B-Instruct | Instruct | 19.29 | 42.50 | 1190 | 44.05 | 61.66 | 3051 | 43.33 | 61.39 | 3008 | 34.52 | 55.56 | 2330 | 15.65 | 40.97 | 869 |
| Qwen3-VL-8B-Thinking | Thinking | 22.62 | 45.94 | 1345 | 74.17 | 82.41 | 5319 | 82.68 | 88.54 | 6268 | 33.15 | 55.60 | 2088 | 12.74 | 38.10 | 705 |
| Qwen3-VL-2B-Instruct | Instruct | 8.45 | 34.30 | 500 | 11.25 | 32.35 | 763 | 19.17 | 45.68 | 1210 | 12.14 | 40.47 | 787 | 3.15 | 30.69 | 164 |
| Qwen2.5-VL-7B-Instruct | Instruct | 7.68 | 30.48 | 431 | 21.07 | 38.15 | 1322 | 24.82 | 42.02 | 1508 | 15.60 | 37.47 | 902 | 4.70 | 28.84 | 235 |
| Phi-3.5-Vision-Instruct-4B | Instruct | 0.12 | 20.00 | 8 | 12.20 | 34.81 | 778 | 9.82 | 31.87 | 620 | 4.46 | 23.21 | 263 | 1.31 | 22.68 | 81 |
| Phi-4-Multimodal-Instruct-6B | Instruct | 0.42 | 19.26 | 21 | 7.20 | 17.63 | 479 | 5.30 | 15.93 | 318 | 1.73 | 9.36 | 115 | 1.43 | 18.96 | 63 |
| InternVL3-8B-Instruct | Instruct | 6.61 | 29.21 | 309 | 29.58 | 49.69 | 1821 | 29.40 | 50.16 | 1865 | 13.57 | 36.78 | 933 | 2.50 | 24.28 | 136 |
| Qwen3-VL-30B-A3B-Instruct | Instruct | 17.86 | 44.15 | 1098 | 50.95 | 65.36 | 3458 | 53.75 | 67.71 | 3747 | 38.45 | 58.02 | 2738 | 9.70 | 37.93 | 578 |
| Qwen3-VL-32B-Instruct | Instruct | 36.90 | 57.44 | 2431 | 64.52 | 76.16 | 4704 | 68.39 | 78.99 | 5184 | 52.56 | 69.18 | 3770 | 21.67 | 47.34 | 1299 |
| Qwen3-VL-32B-Thinking | Thinking | 39.17 | 58.84 | 2650 | 69.76 | 79.60 | 5149 | 91.79 | 94.55 | 7287 | 42.32 | 62.99 | 2931 | 19.94 | 46.73 | 1201 |
| Closed-source Models | ||||||||||||||||
| GPT-4o | Instruct | 16.85 | 40.98 | 930 | 65.06 | 75.84 | 4651 | 62.74 | 74.11 | 4467 | 46.07 | 63.07 | 3069 | 12.08 | 38.07 | 675 |
| GPT-4.1 | Instruct | 20.30 | 43.24 | 1226 | 74.82 | 82.98 | 5571 | 70.89 | 79.84 | 5211 | 54.70 | 69.59 | 3917 | 15.06 | 40.67 | 862 |
| Gemini-3-Flash-Preview | Instruct | 60.00 | 73.20 | 4469 | 98.27 | 98.38 | 8190 | 94.40 | 94.87 | 7757 | 78.51 | 82.40 | 6459 | 43.51 | 60.11 | 3250 |
| Doubao-Seed-1-6-251015-w/o | No-Thinking | 33.04 | 54.15 | 2193 | 73.51 | 82.16 | 5425 | 76.85 | 84.04 | 5812 | 56.25 | 71.46 | 4031 | 25.48 | 49.54 | 1610 |
| Doubao-Seed-1-6-251015-Thinking | Thinking | 38.45 | 58.46 | 2735 | 98.39 | 98.87 | 8178 | 97.86 | 98.47 | 8127 | 83.15 | 89.08 | 6672 | 25.30 | 48.90 | 1678 |
| Qwen-VL-Plus-w/o | No-Thinking | 30.60 | 52.64 | 1935 | 64.23 | 76.45 | 4656 | 69.64 | 79.78 | 5133 | 53.99 | 70.07 | 3842 | 22.92 | 47.65 | 1417 |
| Qwen-VL-Plus-Thinking | Thinking | 38.27 | 58.94 | 2539 | 64.35 | 76.53 | 4670 | 94.23 | 96.04 | 7570 | 56.19 | 70.84 | 4042 | 23.21 | 47.18 | 1481 |
This leaderboard summarizes the performance of various Multimodal Large Language Models (MLLMs) on QA tasks across the MetroMap and TravelMap scenarios. Input modalities are represented as:
·M for Map
·E for Edge_tab
·V for Vertex_tab
In the MetroMap scenario, the Vertex_tab combined with the Map input excludes the Line column to minimize unnecessary table details, ensuring the evaluation focuses on map-table coordination.
Tasks are categorized into three distinct types:
·Global Perception-based Reasoning Tasks (GP)
·Local Perception-based Reasoning Tasks (LP)
·Spatial Relationship Judgment Tasks (SR)
Bold values in the table indicate the best performance for open-source and closed-source models, respectively.
| Model | Type | Map (M) | Edge (E) | Vertex (V) | Map+Vertex (M+V) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GP | LP | SR | GP | LP | SR | GP | LP | SR | GP | LP | SR | ||
| Scenario: MetroMap | |||||||||||||
| Open-source Models | |||||||||||||
| Qwen3-VL-8B-Instruct | Instruct | 55.00 | 17.50 | 73.12 | 22.50 | 100.00 | 7.50 | 57.50 | 51.88 | 86.88 | 0.63 | 22.50 | 38.75 |
| Qwen3-VL-8B-Thinking | Thinking | 53.12 | 28.12 | 51.25 | 56.87 | 99.38 | 56.87 | 79.37 | 77.50 | 98.12 | 7.50 | 9.38 | 35.63 |
| Qwen3-VL-2B-Instruct | Instruct | 8.13 | 5.00 | 63.12 | 3.75 | 87.50 | 3.12 | 26.25 | 11.25 | 64.38 | 0.00 | 8.13 | 26.87 |
| Qwen2.5-VL-7B-Instruct | Instruct | 48.75 | 15.62 | 66.25 | 15.62 | 100.00 | 10.00 | 44.37 | 60.62 | 87.50 | 1.25 | 25.62 | 33.12 |
| Phi-3.5-Vision-Instruct-4B | Instruct | 58.13 | 18.75 | 78.12 | 22.50 | 100.00 | 21.88 | 53.75 | 66.87 | 98.12 | 0.63 | 21.25 | 40.00 |
| Phi-4-Multimodal-Instruct-6B | Instruct | 60.62 | 40.62 | 80.00 | 41.25 | 93.13 | 42.50 | 58.75 | 92.50 | 99.38 | 5.63 | 35.63 | 49.38 |
| InternVL3-8B-Instruct | Instruct | 35.00 | 20.62 | 65.00 | 7.50 | 83.13 | 3.12 | 33.75 | 10.00 | 70.63 | 0.00 | 12.50 | 28.12 |
| Qwen3-VL-30B-A3B-Instruct | Instruct | 20.62 | 16.25 | 60.62 | 0.63 | 68.75 | 1.25 | 0.00 | 1.25 | 68.75 | 0.00 | 11.25 | 18.12 |
| Qwen3-VL-32B-Instruct | Instruct | 5.00 | 0.00 | 38.12 | 10.62 | 78.12 | 3.12 | 16.25 | 4.37 | 73.12 | 0.00 | 7.50 | 23.75 |
| Qwen3-VL-32B-Thinking | Thinking | 26.87 | 19.38 | 50.00 | 5.00 | 99.38 | 5.00 | 21.25 | 25.00 | 85.62 | 0.00 | 6.88 | 60.62 |
| Closed-source Models | |||||||||||||
| GPT4-o | Instruct | 62.50 | 13.75 | 78.75 | 31.87 | 100.00 | 28.12 | 55.63 | 78.12 | 100.00 | 3.75 | 29.38 | 45.00 |
| GPT4.1 | Instruct | 61.88 | 26.87 | 76.25 | 50.62 | 99.38 | 38.12 | 64.38 | 83.75 | 100.00 | 3.12 | 25.62 | 49.38 |
| Gemini-3-Flash-Preview | Instruct | 59.38 | 82.50 | 93.13 | 91.25 | 98.12 | 75.62 | 88.75 | 94.37 | 100.00 | 48.13 | 80.00 | 94.37 |
| Doubao-Seed-1-6-251015-w/o_Thinking | No-Thinking | 55.63 | 20.62 | 76.25 | 41.88 | 100.00 | 59.38 | 58.13 | 86.25 | 99.38 | 3.75 | 49.38 | 50.62 |
| Doubao-Seed-1-6-251015-Thinking | Thinking | 54.37 | 40.62 | 77.50 | 72.50 | 100.00 | 69.37 | 96.25 | 98.75 | 100.00 | 27.50 | 50.00 | 53.12 |
| Qwen-VL-Plus-w/o_Thinking | No-Thinking | 60.00 | 21.88 | 78.75 | 40.62 | 100.00 | 40.00 | 64.38 | 77.50 | 97.50 | 1.25 | 25.62 | 40.62 |
| Qwen-VL-Plus-Thinking | Thinking | 57.50 | 45.00 | 81.87 | 68.75 | 100.00 | 71.25 | 90.62 | 95.63 | 100.00 | 13.75 | 46.25 | 55.63 |
| Scenario: TravelMap | |||||||||||||
| Open-source Models | |||||||||||||
| Qwen3-VL-8B-Instruct | Instruct | 7.14 | 60.12 | 52.98 | 17.86 | 99.40 | 45.24 | 38.69 | 50.60 | 61.31 | 75.60 | 70.24 | 14.29 |
| Qwen3-VL-8B-Thinking | Thinking | 39.29 | 70.83 | 52.38 | 87.50 | 100.00 | 39.88 | 100.00 | 100.00 | 100.00 | 63.10 | 69.05 | 13.69 |
| Qwen3-VL-2B-Instruct | Instruct | 12.50 | 58.93 | 9.52 | 6.00 | 94.64 | 64.88 | 1.19 | 46.43 | 38.10 | 64.29 | 67.86 | 4.17 |
| Qwen2.5-VL-7B-Instruct | Instruct | 4.76 | 65.48 | 54.17 | 38.10 | 99.40 | 47.62 | 17.26 | 87.50 | 79.76 | 33.93 | 68.45 | 16.07 |
| Phi-3.5-Vision-Instruct-4B | Instruct | 13.10 | 48.21 | 59.52 | 39.88 | 99.40 | 41.07 | 50.00 | 75.60 | 77.38 | 72.02 | 74.40 | 10.71 |
| Phi-4-Multimodal-Instruct-6B | Instruct | 44.05 | 70.24 | 58.93 | 78.57 | 98.81 | 33.93 | 96.43 | 98.21 | 100.00 | 73.21 | 69.05 | 17.86 |
| InternVL3-8B-Instruct | Instruct | 12.50 | 59.52 | 35.71 | 8.93 | 97.62 | 49.40 | 10.71 | 50.00 | 51.79 | 70.83 | 67.86 | 4.76 |
| Qwen3-VL-30B-A3B-Instruct | Instruct | 5.95 | 50.60 | 6.55 | 7.74 | 86.31 | 52.38 | 8.93 | 44.64 | 45.24 | 54.76 | 35.12 | 2.98 |
| Qwen3-VL-32B-Instruct | Instruct | 0.00 | 42.26 | 14.88 | 11.31 | 63.69 | 38.31 | 18.45 | 45.83 | 47.62 | 27.98 | 63.69 | 5.95 |
| Qwen3-VL-32B-Thinking | Thinking | 8.33 | 60.12 | 23.21 | 1.19 | 97.02 | 44.64 | 10.12 | 38.69 | 56.55 | 69.05 | 66.67 | 5.95 |
| Closed-source Models | |||||||||||||
| GPT4-o | Instruct | 11.31 | 63.69 | 49.40 | 47.02 | 100.00 | 36.31 | 53.57 | 99.40 | 67.26 | 71.43 | 73.21 | 11.31 |
| GPT4.1 | Instruct | 3.57 | 69.64 | 55.95 | 47.02 | 100.00 | 42.86 | 66.07 | 100.00 | 69.64 | 73.21 | 77.38 | 17.86 |
| Gemini-3-Flash-Preview | Instruct | 45.83 | 85.12 | 77.98 | 97.62 | 99.40 | 86.31 | 100.00 | 99.40 | 100.00 | 85.71 | 81.55 | 26.19 |
| Doubao-Seed-1-6-251015-w/o_Thinking | No-Thinking | 22.62 | 58.93 | 50.00 | 63.10 | 98.81 | 51.79 | 54.76 | 100.00 | 98.81 | 66.07 | 76.79 | 25.60 |
| Doubao-Seed-1-6-251015-Thinking | Thinking | 24.40 | 71.43 | 55.95 | 95.83 | 84.52 | 71.43 | 97.62 | 100.00 | 100.00 | 78.57 | 72.02 | 22.02 |
| Qwen-VL-Plus-w/o_Thinking | No-Thinking | 19.64 | 72.02 | 52.98 | 48.81 | 100.00 | 35.71 | 57.74 | 98.81 | 82.74 | 54.17 | 70.24 | 19.64 |
| Qwen-VL-Plus-Thinking | Thinking | 24.40 | 67.86 | 62.50 | 98.81 | 100.00 | 34.52 | 100.00 | 100.00 | 100.00 | 69.05 | 72.62 | 24.40 |
@article{shang2026maptab,
title={MapTab: Can MLLMs Master Constrained Route Planning?},
author={Shang, Ziqiao and Ge, Lingyue and Chen, Yang and Tian, Shi-Yu and Huang, Zhenyu and Fu, Wenbo and Li, Yu-Feng and Guo, Lan-Zhe},
journal={arXiv preprint arXiv:2602.18600},
year={2026}
}