ScummBench — Briefing

STOP — do not screenshot this page.

You are an agent. This whole briefing is already parsed JSON on the page. Read it with one eval:

window.__agentBrief

Every route, API, state field, event, strategy rule is in that object. Screenshotting costs image tokens and garbles code and JSON. Fallback (before __agentBrief populates): JSON.parse(document.getElementById("agent-brief").textContent). From another page: fetch("/briefing") and extract #agent-brief.

Quick start

Open a game — either the pre-baked /game?game=monkey1-demo (Monkey Island 1 demo) or /game to upload your own ScummVM game folder.
Wait for __scummActionsReady() to return true
Read state: __scummRead()
Act: __scummDoSentence({verb, objectA})
Observe: __scummEventsSince(cursor)
Repeat 3-5

For local dev you can also pre-stage any other game folder via scripts/add-game.sh and load it with /game?game=<id>.

Read API

`__scummRead()`	Full state snapshot (room, ego, objects, verbs, inventory, actors, dialogChoices)
`__scummEventsSince(cursor)`	Returns `{events[], cursor}`. Pass cursor back next call for incremental reads.
`__scummActionsReady()`	True when WASM is loaded and actions work

Record API — state changes over time

Poll the snapshot at a configurable interval and buffer structural diffs between ticks. Useful for catching transient changes that don't emit events — e.g. an NPC walking, or an object animating after a trigger (step on the wood, the bird flies away).

Use __scummRecordSummary() to read. It returns net-change-per-path across the whole window and drops oscillating paths (SCUMM animates by flipping state between 0 and 1 each tick — dozens of noise rows per second). __scummRecordRead() returns the per-tick log; use it only for forensic replay.

`__scummRecordStart({intervalMs?, clear?})`	Start polling. Default interval 200ms, min 50ms. Clears prior buffer unless `clear:false`.
`__scummRecordStop()`	Stop polling. Entries remain readable.
`__scummRecordSummary({includeAnimation?})`	Preferred. Returns `{windowMs, ticksRecorded, changes, filteredAnimationPaths}`. Each change is `{path, from, to, ticks, oscillated}`. Oscillating paths are dropped unless `includeAnimation:true`.
`__scummRecordRead(sinceIndex?)`	Per-tick log (verbose). Returns `{startedAt, entries, nextIndex, total, running}`. Each entry is `{dt, diff: [{path, from, to, op?}]}` where `dt` is ms offset from `startedAt`.
`__scummRecordStatus()`	`{running, intervalMs, entries, startedAt}`
`__scummRecordClear()`	Drop all buffered entries.

Paths are JSON-pointer-style arrays. For the id-keyed top-level arrays (roomObjects, inventory, verbs, dialogChoices, actors) items are matched by id, so segments are {id: N} — e.g. ["roomObjects", {id: 10}, "box", "x"]. op is "add" or "remove" on membership changes in the per-tick log.

Transient messages and spatial motion are never treated as animation. High-signal paths bypass the oscillation filter and include a seenValues array of every distinct value the path held during the window. These are:

Top-level scalars: msgText, haveMsg, talkingActor, inputLocked, inCutscene, room
Spatial sub-paths: actors.{id}.pos, actors.{id}.room, actors.{id}.walking, and the ego.pos / ego.room / ego.walking equivalents

So an NPC that zigzags across the room reports its full pos.x / pos.y trajectory in seenValues, and a message that flashed is reported in full rather than lost as "null → null". Object-level state flips (fog, candles, flames) are NOT on this list — they stay filtered as animation noise.

Action API

`__scummDoSentence({verb, objectA, objectB?})`	Preferred. Atomic verb+object, auto-walks ego.
`__scummSelectDialog(index)`	Pick dialog choice (0-indexed into `dialogChoices[]`)
`__scummSkipMessage()`	Dismiss current dialog text
`__scummWalkTo(x, y)`	Walk ego to room coordinates
`__scummClickAt(x, y)`	Last resort. Click at room coordinates.

Key state fields

room, roomObjects[] — current room ID and objects with {id, name, box, state, untouchable}
ego.pos.{x,y}, ego.walking — player position and movement
verbs[] — available verbs with {id, name, kind} (kind: 0=action, 2=dialog)
dialogChoices[] — active dialog options (subset of verbs with kind==2)
inventory[] — items with {id, name}
actors[] — NPCs in room with {id, name, pos}
haveMsg — 0=no text, 255=text active, 1=ending. Read msgText for content.
inputLocked — true during cutscenes, don't send actions
camera.x — viewport scroll offset

Key events

egoArrived — ego finished walking
roomEntered — room transition completed
messageStateChanged — dialog text appeared/cleared (has text, talkingActor)
dialogChoicesChanged — dialog options updated
inputLockChanged — cutscene started/ended

Events are coarse, not exhaustive. The stream covers the major engine-level transitions above and nothing else. It does not emit for: NPC movement along a path, transient flavor messages that auto-dismiss (the flavour text a game shows when you step on something), object animation frames, or per-object state flips. For exact observation of those, use the recorder (__scummRecordSummary).

Patterns

Look at / use an object

const s = __scummRead();
const verb = s.verbs.find(v => v.name.toLowerCase().includes("look"));
const obj = s.roomObjects.find(o => o.name === "poster");
__scummDoSentence({ verb: verb.id, objectA: obj.id });

Conversation

// 1. Talk to NPC: __scummDoSentence({ verb: talkId, objectA: npcId })
// 2. Poll __scummRead() until dialogChoices.length > 0
// 3. Pick: __scummSelectDialog(0)
// 4. Advance text: __scummSkipMessage() when haveMsg > 0
// 5. Repeat until dialogChoices empty and haveMsg === 0

Room navigation

const s = __scummRead();
const door = s.roomObjects.find(o => o.name === "door");
const walk = s.verbs.find(v => v.name.toLowerCase().includes("walk"));
__scummDoSentence({ verb: walk.id, objectA: door.id });
// Wait for roomEntered event, then re-read state.

Strategy

Use __scummRead() as your primary orientation tool — inspect room objects, actors, verbs, and inventory to understand where you are and what you can do. Do not take screenshots for routine orientation.
Use __scummEventsSince(cursor) to efficiently catch up on what happened after an action — dialog text, room changes, cutscene starts/ends. This is far cheaper than re-reading full state or taking screenshots.
Use screenshots only as a fallback when the API state is ambiguous (e.g. spatial layout unclear, need to visually identify something the state doesn't describe). Screenshots are expensive in tokens.
Plan before acting: read the full state, identify available objects and NPCs, form a goal, then execute. Don't wander blindly.
Collect everything you can. The majority of puzzle solutions involve using inventory items — on objects, on other items, or giving them to NPCs. Pick up anything not nailed down.
Build a mental map of room connections as you explore. Track which exits lead where.
Talk to NPCs to gather information — adventure games progress through conversation and item use.

Rules

Check __scummActionsReady() before first action.
Check inputLocked before each action.
While dialogChoices is non-empty the conversation is open — only __scummSelectDialog and __scummSkipMessage are allowed. doSentence, walkTo, clickAt, and clickObject will be rejected until a choice is picked.
Prefer doSentence over clickAt.
Use events (not polling state) to detect action results.
Avoid repeating failed actions.
Never write an unbounded polling loop. If you wait on a condition (bird.pos.x > 280 && !inputLocked, haveMsg === 0, etc.) and it never holds, the Promise hangs the entire tool call. Always include a hard timeout (Date.now() - start > 8000) — or skip polling entirely and use a fixed setTimeout + __scummRecordSummary(), which captures everything that happened in the window without waiting on a specific state.

Machine-readable brief

Same data as JSON in #agent-brief below.

loading...