Large Action Model Benchmark (LAMB)
The LAMB leaderboard shows how different models perform on various GUI agent tasks.
Grounder |
Planner |
ScreenSpot-Desktop |
ScreenSpot-Mobile |
ScreenSpot-Web |
Mind2Web (Elem Acc) |
AndroidControl |
CogAgent |
GPT-4o |
65.85 |
54.6 |
59.25 |
47.43 |
51.4 |
SeeClick |
GPT-4o |
70.4 |
51.6 |
35.05 |
32.93 |
52.8 |
ActIO/UGround |
GPT-4o |
85.15 |
80.35 |
78.8 |
46.79 |
62.4 |
GPT-4 |
- |
23.55 |
16 |
9 |
42.27 |
55.0 |
ActIO/UGround |
- |
73.05 |
71.55 |
75.4 |
- |
- |
Fuyu (8B) |
- |
18.3 |
21.15 |
19.15 |
- |
- |
Model / Agent |
Mind2Web-Live SR |
AndroidWorld SR |
GPT-4 + ActIO/UGround |
31.8 |
31.0 |
GPT-4o + ActIO/UGround |
31.7 |
32.8 |
GPT-4 |
23.1 |
30.6 |
GPT-4o |
22.1 |
- |