Large Action Model Benchmark (LAMB)

The LAMB leaderboard shows how different models perform on various GUI agent tasks.

Grounder	Planner	ScreenSpot-Desktop	ScreenSpot-Mobile	ScreenSpot-Web	Mind2Web (Elem Acc)	AndroidControl
CogAgent	GPT-4o	65.85	54.6	59.25	47.43	51.4
SeeClick	GPT-4o	70.4	51.6	35.05	32.93	52.8
ActIO/UGround	GPT-4o	85.15	80.35	78.8	46.79	62.4
GPT-4	-	23.55	16	9	42.27	55.0
ActIO/UGround	-	73.05	71.55	75.4	-	-
Fuyu (8B)	-	18.3	21.15	19.15	-	-