AGENT BRIEFING
ScummBench
Harness for ScummVM games. Play the
Monkey Island 1 demo
or upload your own game, then drive it
through the window.__scumm* API.
Quick start
- Open a game — either the pre-baked
/game?game=monkey1-demo(Monkey Island 1 demo) or/gameto upload your own ScummVM game folder. - Wait for
__scummActionsReady()to return true - Read state:
__scummRead() - Act:
__scummDoSentence({verb, objectA}) - Observe:
__scummEventsSince(cursor) - Repeat 3-5
For local dev you can also pre-stage any other game folder via
scripts/add-game.sh and load it with
/game?game=<id>.
Read API
__scummRead() | Full state snapshot (room, ego, objects, verbs, inventory, actors, dialogChoices) |
__scummEventsSince(cursor) | Returns {events[], cursor}. Pass cursor back next call for incremental reads. |
__scummActionsReady() | True when WASM is loaded and actions work |
Record API — state changes over time
Poll the snapshot at a configurable interval and buffer structural diffs between ticks. Useful for catching transient changes that don't emit events — e.g. an NPC walking, or an object animating after a trigger (step on the wood, the bird flies away).
Use __scummRecordSummary() to read. It returns net-change-per-path across the whole window and drops oscillating paths (SCUMM animates by flipping state between 0 and 1 each tick — dozens of noise rows per second). __scummRecordRead() returns the per-tick log; use it only for forensic replay.
__scummRecordStart({intervalMs?, clear?}) | Start polling. Default interval 200ms, min 50ms. Clears prior buffer unless clear:false. |
__scummRecordStop() | Stop polling. Entries remain readable. |
__scummRecordSummary({includeAnimation?}) | Preferred. Returns {windowMs, ticksRecorded, changes, filteredAnimationPaths}. Each change is {path, from, to, ticks, oscillated}. Oscillating paths are dropped unless includeAnimation:true. |
__scummRecordRead(sinceIndex?) | Per-tick log (verbose). Returns {startedAt, entries, nextIndex, total, running}. Each entry is {dt, diff: [{path, from, to, op?}]} where dt is ms offset from startedAt. |
__scummRecordStatus() | {running, intervalMs, entries, startedAt} |
__scummRecordClear() | Drop all buffered entries. |
Paths are JSON-pointer-style arrays. For the id-keyed top-level arrays (roomObjects, inventory, verbs, dialogChoices, actors) items are matched by id, so segments are {id: N} — e.g. ["roomObjects", {id: 10}, "box", "x"]. op is "add" or "remove" on membership changes in the per-tick log.
Transient messages and spatial motion are never treated as animation. High-signal paths bypass the oscillation filter and include a seenValues array of every distinct value the path held during the window. These are:
- Top-level scalars:
msgText,haveMsg,talkingActor,inputLocked,inCutscene,room - Spatial sub-paths:
actors.{id}.pos,actors.{id}.room,actors.{id}.walking, and theego.pos/ego.room/ego.walkingequivalents
So an NPC that zigzags across the room reports its full pos.x / pos.y trajectory in seenValues, and a message that flashed is reported in full rather than lost as "null → null". Object-level state flips (fog, candles, flames) are NOT on this list — they stay filtered as animation noise.
Action API
__scummDoSentence({verb, objectA, objectB?}) | Preferred. Atomic verb+object, auto-walks ego. |
__scummSelectDialog(index) | Pick dialog choice (0-indexed into dialogChoices[]) |
__scummSkipMessage() | Dismiss current dialog text |
__scummWalkTo(x, y) | Walk ego to room coordinates |
__scummClickAt(x, y) | Last resort. Click at room coordinates. |
Key state fields
room,roomObjects[]— current room ID and objects with{id, name, box, state, untouchable}ego.pos.{x,y},ego.walking— player position and movementverbs[]— available verbs with{id, name, kind}(kind: 0=action, 2=dialog)dialogChoices[]— active dialog options (subset of verbs with kind==2)inventory[]— items with{id, name}actors[]— NPCs in room with{id, name, pos}haveMsg— 0=no text, 255=text active, 1=ending. ReadmsgTextfor content.inputLocked— true during cutscenes, don't send actionscamera.x— viewport scroll offset
Key events
egoArrived— ego finished walkingroomEntered— room transition completedmessageStateChanged— dialog text appeared/cleared (hastext,talkingActor)dialogChoicesChanged— dialog options updatedinputLockChanged— cutscene started/ended
Events are coarse, not exhaustive. The stream covers the major engine-level transitions above and nothing else. It does not emit for: NPC movement along a path, transient flavor messages that auto-dismiss (the flavour text a game shows when you step on something), object animation frames, or per-object state flips. For exact observation of those, use the recorder (__scummRecordSummary).
Patterns
Look at / use an object
const s = __scummRead();
const verb = s.verbs.find(v => v.name.toLowerCase().includes("look"));
const obj = s.roomObjects.find(o => o.name === "poster");
__scummDoSentence({ verb: verb.id, objectA: obj.id });
Conversation
// 1. Talk to NPC: __scummDoSentence({ verb: talkId, objectA: npcId })
// 2. Poll __scummRead() until dialogChoices.length > 0
// 3. Pick: __scummSelectDialog(0)
// 4. Advance text: __scummSkipMessage() when haveMsg > 0
// 5. Repeat until dialogChoices empty and haveMsg === 0
Room navigation
const s = __scummRead();
const door = s.roomObjects.find(o => o.name === "door");
const walk = s.verbs.find(v => v.name.toLowerCase().includes("walk"));
__scummDoSentence({ verb: walk.id, objectA: door.id });
// Wait for roomEntered event, then re-read state.
Strategy
- Use
__scummRead()as your primary orientation tool — inspect room objects, actors, verbs, and inventory to understand where you are and what you can do. Do not take screenshots for routine orientation. - Use
__scummEventsSince(cursor)to efficiently catch up on what happened after an action — dialog text, room changes, cutscene starts/ends. This is far cheaper than re-reading full state or taking screenshots. - Use screenshots only as a fallback when the API state is ambiguous (e.g. spatial layout unclear, need to visually identify something the state doesn't describe). Screenshots are expensive in tokens.
- Plan before acting: read the full state, identify available objects and NPCs, form a goal, then execute. Don't wander blindly.
- Collect everything you can. The majority of puzzle solutions involve using inventory items — on objects, on other items, or giving them to NPCs. Pick up anything not nailed down.
- Build a mental map of room connections as you explore. Track which exits lead where.
- Talk to NPCs to gather information — adventure games progress through conversation and item use.
Rules
- Check
__scummActionsReady()before first action. - Check
inputLockedbefore each action. - While
dialogChoicesis non-empty the conversation is open — only__scummSelectDialogand__scummSkipMessageare allowed.doSentence,walkTo,clickAt, andclickObjectwill be rejected until a choice is picked. - Prefer
doSentenceover clickAt. - Use events (not polling state) to detect action results.
- Avoid repeating failed actions.
- Never write an unbounded polling loop. If you wait on a condition (
bird.pos.x > 280 && !inputLocked,haveMsg === 0, etc.) and it never holds, the Promise hangs the entire tool call. Always include a hard timeout (Date.now() - start > 8000) — or skip polling entirely and use a fixedsetTimeout+__scummRecordSummary(), which captures everything that happened in the window without waiting on a specific state.
Machine-readable brief
Same data as JSON in #agent-brief below.
loading...