opencli-browser
已认证opencli-browser
The first reader of this CLI is an agent, not a human. Every subcommand returns a structured envelope that tells you exactly what matched, how confident the match is, and what to do if it didn't. Lean on those envelopes — do not guess.
This skill is for driving a live browser to accomplish an agent task. If you are building a reusable adapter under ~/.opencli/clis/<site>/ use opencli-adapter-author instead.
Prerequisites
opencli doctor
Until doctor is green, nothing else will work. Typical failures: Chrome not running, extension not installed, debug port blocked by 1Password / other extensions. The doctor output tells you which.
Lease lifecycle
opencli browser *commands keep an owned tab lease alive between calls. Owned leases share a dedicated automation container and are released withopencli browser closeor when the idle timeout expires.opencli browser bindbinds abound:*workspace to the Chrome tab you already have open. Use this for logged-in pages, SSO flows, or pages you manually positioned before handing control to the agent.--focus(orOPENCLI_WINDOW_FOCUSED=1) opens the automation container in the foreground. Use it when you want to watch the page live.--live(orOPENCLI_LIVE=1) is mainly for browser-backed adapter commands such asopencli xiaohongshu note .... It keeps the adapter's automation lease open after the command returns so you can inspect the final page state.
Bind Tab
opencli browser bind --domain example.com
opencli browser --workspace bound:default state
opencli browser --workspace bound:default click "Search"
opencli browser --workspace bound:default network
opencli browser unbind
Binding uses a separate bound:* workspace. It never owns the user window, never closes the user tab, and fails closed if the tab is closed or becomes non-debuggable. Re-run bind when you switch to a different real tab.
Use --domain <host> and --path-prefix <path> to avoid binding the wrong tab:
opencli browser bind --workspace bound:gmail --domain mail.google.com --path-prefix /mail
opencli browser --workspace bound:gmail state
Navigation is blocked by default on bound workspaces because it can destroy the logged-in/positioned state you wanted to preserve. browser open and browser back require --allow-navigate-bound; tab mutation (tab new, tab select, tab close) is blocked for bound workspaces. Use a normal browser:* automation workspace when you want OpenCLI to own tab lifecycle.
opencli browser sessions returns idleMsRemaining: null for bound workspaces. That means there is no OpenCLI idle-close timer; the binding lasts until unbind, tab close, window close, or daemon restart.
Mental model
- Selector-first target contract. Every interaction command (
click,type,select,get text/value/attributes) takes one<target>, which is either a numeric ref fromstate/findor a CSS selector. Use--nth <n>to disambiguate multiple CSS matches. - Every envelope reports
matches_nandmatch_level.match_levelisexact,stable, orreidentified— the CLI already rescued moderate DOM drift for you, but the level tells you how confident to be. - Compact output first, full payload on demand.
stateis a budget-aware snapshot;get html --as jsonsupports--depth/--children-max/--text-max;networkreturns shape previews and you re-fetch a single body with--detail <key>. If you emit a giant payload you are burning context you did not need. - Structured errors are machine-readable. On failure the CLI emits
{error: {code, message, hint?, candidates?}}. Branch oncode, not on message strings.
Critical rules
- Always inspect before you act. Run
stateorfindfirst. Never hard-code a ref or selector from memory across sessions — indices are per-snapshot. - Prefer numeric ref over CSS once you have it. Numeric refs survive mild DOM shifts because the CLI fingerprints each tagged element. A CSS selector written by hand will break the first time the site re-renders.
- Read
match_levelafter every write.exact= all good.stable= the element is the same but some soft attrs drifted — your action still applied.reidentified= the original ref was gone and the CLI found a unique replacement; double-check you hit the right element. - Use the
compoundfield for form controls. Do not regex-guess a date format, do notstatetwice to get the full<select>options list. The compound envelope has the format string, full option list up to 50,options_totalfor overflow, andaccept/multiplefor<input type=file>. - Verify writes that matter. After
type <target> <text>, runget value <target>. Afterselect, runget value. Autocomplete widgets, React controlled inputs, and masked fields all silently eat characters. The CLI cannot detect this for you. state→ action →stateafter a page change. Navigations, form submits, and SPA route changes invalidate refs. Take a fresh snapshot. Do not reuse refs from before the transition.- Chain with
&&. A chained sequence runs in one shell so refs acquired by the first command stay live for the second. Separate shell invocations lose the session context you just set up. evalis read-only. Wrap the JS in an IIFE and return JSON. If you need to change the page, use the structuredclick/type/select/keyscommands instead — they produce structured output and fingerprints,evaldoes not.- Prefer
networkto screen-scraping. If a page you care about fetches its data from a JSON API, the API is almost always more reliable than scraping the rendered DOM. Capture once, inspect the shape, then--detail <key>the body you need.
Target contract (<target> for click / type / select / get text|value|attributes)
<target> ::= <numeric-ref> | <css-selector>
- Numeric ref — the
[N]index fromstateorfind. Cheap, resilient to soft DOM drift. - CSS selector — anything
querySelectorAllaccepts. Must be unambiguous on write ops, or pair with--nth <n>.
Envelope on success
{ "clicked": true, "target": "3", "matches_n": 1, "match_level": "exact" }
{ "value": "kalevin@example.com", "matches_n": 1, "match_level": "stable" }
match_level
| level | meaning | you should |
|---|---|---|
exact | Fingerprint agreed on tag + strong IDs with at most one soft drift | Proceed. |
stable | Tag + strong IDs still agree, soft signals (aria-label, role, text) drifted | Proceed, but if what you typed/clicked matters, re-check with get value or state. |
reidentified | Original ref was gone; a unique live element matched the fingerprint and was re-tagged with the old ref | Double-check you hit the right element before chaining more writes. |
Structured error codes
Branch on these, not on the human message:
| code | meaning |
|---|---|
not_found | Numeric ref is no longer in the DOM. Re-state. |
stale_ref | Ref exists but the element at that ref changed identity. Re-state. |
invalid_selector | CSS was rejected by querySelectorAll. Fix the selector. |
selector_not_found | CSS matches 0 elements. Try find with a looser selector. |
selector_ambiguous | CSS matches >1 and no --nth. Add --nth or narrow the selector. |
selector_nth_out_of_range | --nth beyond match count. |
option_not_found | select couldn't find an option matching that label/value. Error envelope includes available: string[] of the real option labels. |
not_a_select | select was called on a non-<select> element. |
Error envelope always includes error.code and error.message. Target errors (selector_not_found, selector_ambiguous, etc.) often add error.candidates: string[] with suggested selectors. option_not_found adds error.available: string[] instead.
Command reference
Inspect
| command | purpose |
|---|---|
browser state | Snapshot: text tree with [N] refs, scroll hints, hidden-interactive hints, compounds (N): sidecar for date/select/file refs. |
browser state --source ax | Opt-in accessibility-tree snapshot. Use when custom controls, portals, or same-origin iframes are hard to identify in normal state. AX refs can recover stale React re-renders by role/name/nth. |
browser state --compare-sources | Metrics-only DOM vs AX comparison for deciding whether AX should become default. It prints counts and sizes, not page text, so it is safer to share for validation. |
browser find --css <sel> [--limit N] [--text-max N] | Run a CSS query and return one entry per match with {nth, ref, tag, role, text, attrs, visible, compound?}. Allocates refs for matches the prior snapshot didn’t tag. Cheap alternative to state when you already know the selector. |
browser frames | List cross-origin iframe targets. Pass the index to --frame on eval. |
browser screenshot [path] | Viewport PNG. No path → base64 to stdout. Prefer state when you just need structure. |
Get (read-only)
| command | returns | |
|---|---|---|
browser get title | plain text | |
browser get url | plain text | |
browser get text <target> [--nth N] | {value, matches_n, match_level} | |
browser get value <target> [--nth N] | {value, matches_n, match_level} | |
browser get attributes <target> [--nth N] | {value: {attr: val, ...}, matches_n, match_level} | |
| `browser get html [--selector <css>] [--as html\ | json] [--depth N] [--children-max N] [--text-max N] [--max N]` | Raw HTML, or structured tree. JSON tree nodes have {tag, attrs, text, children[], compound?}. Truncation reported via truncated: {depth?, children_dropped?, text_truncated?}. |
Interact
| command | notes |
|---|---|
browser click <target> [--nth N] | Returns {clicked, target, matches_n, match_level}. |
browser type <target> <text> [--nth N] | Clicks first, then types. Returns {typed, text, target, matches_n, match_level, autocomplete}. autocomplete: true means a combobox/datalist popup appeared after typing — you almost always need keys Enter or a follow-up click on the suggestion to commit the value. |
browser fill <target> <text> [--nth N] | Exact replacement for input, textarea, and contenteditable targets. Returns {filled, verified, text, actual, matches_n, match_level}. Use this when you need raw text set and verified, not keyboard/autocomplete behavior. Pipeline form supports { fill: { ref, text, submit: true } }. |
browser select <target> <option> [--nth N] | Matches option by label first, then value. Use compound from find/state to see exactly what labels are available. |
browser keys <key> | Enter, Escape, Tab, Control+a, etc. Runs against the focused element. |
browser scroll <direction> [--amount px] | up / down. Default amount 500. |
Wait
browser wait selector "<css>" [--timeout ms] # wait until the selector matches
browser wait text "<substring>" [--timeout ms] # wait until the text appears
browser wait time <seconds> # hard sleep, last resort
Default timeout 10000 ms. SPA routes, login redirects, and lazy-loaded lists need wait before state/get.
Extract
web read --url <url>— One-shot Markdown reader for arbitrary pages. It expands relevant same-origin iframes by default, so old iframe-shell sites work better than with a top-document-only scrape. Use--frames all-same-originwhen completeness matters more than Markdown noise. For AJAX shell pages useopencli web read --url <url> --wait-for "<selector>" --wait-until networkidle --diagnose; diagnostics show frame URLs, empty containers, and API‑like XHRs. If the value you need is table/API data, switch tobrowser networkor a dedicated adapter instead of relying on Markdown.browser eval <js> [--frame N]— Run an expression in the page (or in a cross-origin frame via--frame). Wrap in an IIFE and return JSON. Read‑only: nodocument.forms[0].submit(), no clicks, no navigations. If the result is a string, stdout is the raw string; otherwise it's JSON.browser extract [--selector <css>] [--chunk-size N] [--start N]— Markdown extraction of long‑form content with a continuation cursor. Returns{url, title, selector, total_chars, chunk_size, start, end, next_start_char, content}. Loop onnext_start_charuntil it isnull. Auto‑scopes to<main>/<article>/<body>if you don't pass--selector.
Network
browser network # shape preview + cache key list
browser network --detail <key> # full body for one cached entry
browser network --filter "field1,field2" # keep only entries whose body shape contains BOTH fields as path segments
browser network --all # include static resources (usually noise)
browser network --raw # full bodies inline — large; use sparingly
browser network --ttl <ms> # cache TTL (default 24h)
List entries look like {key, method, status, url, ct, size, shape, body_truncated?}. Detail envelope is {key, url, method, status, ct, size, shape, body, body_truncated?, body_full_size?, body_truncation_reason}. Cache lives at ~/.opencli/cache/browser-network/ so you can re‑inspect without re‑triggering the request.
Default output keeps JSON/XML/plain‑text and JS‑like API responses, then drops obvious static assets and telemetry by URL. If an expected endpoint is missing, run browser network --all once and check whether an unusual content type or URL filter hid it.
Pitfalls
- Do not submit forms via
eval "document.forms[0].submit()"— modern sites intercept with JS handlers and silently drop the call. Eitherclickthe submit button via its ref, or (if you know the GET URL) justopenit directly. - Do not reuse refs across a page transition.
waitfor the new state, then re‑state. Old refs will either 404 or (worse)reidentifyonto a similarly‑shaped element on the new page. match_level: reidentifiedis a warning, not an error. The action went through, but if you are chaining 5 more writes that all depend on that being the right element, verify with aget textorget valuebefore continuing.- Budget‑aware commands silently cap.
get html --as jsonwith default budgets will returntruncated: {...}. If your downstream logic needs the whole subtree, raise--depth/--children-maxor tighten the selector. autocomplete: trueon atyperesponse is not an error. It means a suggestion popup is open and your value isn’t committed yet. Typicallykeys Enterto accept the first suggestion, orclickthe one you want.network --filteris AND‑semantics on path segments.--filter "title,score"keeps entries whose body shape contains bothtitleandscoreas path segments, at any depth. It is not a regex.- Screenshots are for humans, not for agents. Use
state+findunless the page is genuinely visual (captcha, charts). Screenshots burn tokens and rarely add signal an agent can act on.
Troubleshooting
| symptom | fix |
|---|---|
opencli doctor red: "Browser not connected" | Start Chrome with --remote-debugging-port=9222, or install the extension from the Chrome Web Store. |
attach failed: chrome-extension://... | Disable 1Password / other CDP‑hungry extensions temporarily. |
selector_not_found right after state | Page mutated. wait selector "..." then retry. |
stale_ref across every command | You are reusing refs from a prior page. Re‑state. |
click succeeds but nothing happens | The element is probably a decorative wrapper stealing clicks from the real target. find --css "..." with a narrower selector and retry on the inner element. |
type appears to finish but value is wrong | Autocomplete, masked input, or React controlled re‑render. Verify with get value. Add keys Enter or re‑type. |
Giant get html output | Pass --selector + --as json --depth 3 --children-max 20 --text-max 200. |
| Network cache seems stale | Bump --ttl down, or let it expire. The cache lives at ~/.opencli/cache/browser-network/. |
See also
opencli-adapter-author— turning what you just figured out into a reusable~/.opencli/clis/<site>/<command>.js.opencli-autofix— when an existing adapter breaks, this skill walks you through--trace retain-on-failureevidence and filing a fix.
中文使用指南
快速开始
opencli doctor检查 Chrome 是否已连接。opencli browser open <url>打开新标签并返回targetId。opencli browser state获取页面快照,得到[N]refs 与compound信息。- 使用
opencli browser click <ref>、type <ref> <text>、select <ref> <option>完成交互。 - 需要等待时使用
opencli browser wait selector "<css>"或wait text "<文字>"。
常用命令
| 命令 | 说明 | |||
|---|---|---|---|---|
browser open <url> | 打开页面 | |||
browser state | 快照,返回 refs | |||
browser find --css <selector> | 按 CSS 查询 | |||
browser click <target> | 点击 | |||
browser type <target> <text> | 输入 | |||
browser fill <target> <text> | 直接填充 | |||
browser select <target> <option> | 选择下拉 | |||
browser wait selector "<css>" | 等待元素出现 | |||
browser get text <target> | 读取文本 | |||
browser network | 查看捕获的 API 请求 | |||
browser extract | 分块读取长文章 | |||
| `browser tab list | new | select | close` | 管理标签页 |
browser bind / unbind | 绑定/解除已登录用户标签,保持登录状态 |
错误处理
match_level: reidentified→ 先get text/value再继续。selector_not_found→ 页面已变,使用wait selector后重试。option_not_found→ 查看error.available中的真实选项。
环境变量(保持英文键名)
OPENCLI_DAEMON_PORT=19825
OPENCLI_PROFILE=…
OPENCLI_WINDOW_FOCUSED=0
OPENCLI_LIVE=0
OPENCLI_BROWSER_REUSE=none
…
示例:登录并获取粉丝数
opencli browser open "https://example.com/login"
opencli browser state # 假设 [4] email, [5] password, [6] 登录按钮
opencli browser type 4 "my@example.com"
opencli browser type 5 "mypwd"
opencli browser click 6
opencli browser wait selector ".user-avatar" --timeout 15000
opencli browser state
opencli browser get text 12 # 12 为粉丝数元素的 ref
💡 安装方法
下载 ZIP 解压到 skills/ 目录即可使用