Large Action Model Benchmark (LAMB)

The LAMB leaderboard shows how different models perform on various GUI agent tasks.

Grounder Planner ScreenSpot-Desktop ScreenSpot-Mobile ScreenSpot-Web Mind2Web (Elem Acc) AndroidControl
CogAgent GPT-4o 65.85 54.6 59.25 47.43 51.4
SeeClick GPT-4o 70.4 51.6 35.05 32.93 52.8
ActIO/UGround GPT-4o 85.15 80.35 78.8 46.79 62.4
GPT-4 - 23.55 16 9 42.27 55.0
ActIO/UGround - 73.05 71.55 75.4 - -
Fuyu (8B) - 18.3 21.15 19.15 - -
Model / Agent Mind2Web-Live SR AndroidWorld SR
GPT-4 + ActIO/UGround 31.8 31.0
GPT-4o + ActIO/UGround 31.7 32.8
GPT-4 23.1 30.6
GPT-4o 22.1 -