All Hermes artifactsMarkdown source

Hermes Autonomy Stage 3 — Implementation Plan

Generated: 2026-05-03T17:58:30+00:00

Project: Hermes Autonomy Metaproject

Goal

Implement the Stage 3 overnight workbench so Hermes can turn evening-sweep candidates into bounded overnight jobs, execute safe work unattended, and surface reviewable outputs in the next morning brief.

Stage 3 is not just another digest. It is the first production loop where Hermes does useful work without Daniel prompting each task:

  1. Evening sweep discovers or proposes work.
  2. Queue writer converts eligible work into durable jobs.
  3. Worker executes up to 10 jobs/night, at most 3 per project.
  4. Software-development jobs open PRs as the primary review artifact.
  5. Morning brief reports what ran, what opened PRs, what failed, and what needs Daniel review.

Current baseline

Already present:

- emits overnight_queue_candidates

- currently uses conservative next-action heuristics

- emits goal-aware morning brief JSON

- currently lacks an overnight_workbench section

- evening-sweep at 22:33 ET, delivered to Telegram General

- morning-brief at 07:57 ET, delivered to Telegram General

- Stage 3 design at the Hermes Artifacts web app

Missing:

Implementation shape

Use local JSONL as the queue substrate and local JSON result files as the execution ledger.

~/.hermes/state/overnight_workbench/
  queue.jsonl
  claims/
    <job_id>.lock
  runs/
    YYYY-MM-DD/
      <job_id>/
        job.json
        result.json
        prompt.md
        stdout.log
        stderr.log
        summary.md

Scripts:

~/.hermes/scripts/overnight_workbench_queue.py
~/.hermes/scripts/overnight_workbench_worker.py
~/.hermes/scripts/overnight_workbench_context.py
~/.hermes/scripts/tests/test_overnight_workbench_queue.py
~/.hermes/scripts/tests/test_overnight_workbench_worker.py
~/.hermes/scripts/tests/test_overnight_workbench_context.py

Systemd user units:

~/.config/systemd/user/hermes-overnight-workbench.service
~/.config/systemd/user/hermes-overnight-workbench.timer

Data model

Queue job

One JSON object per line in queue.jsonl:

{
  "job_id": "2026-05-03_overnight_<short_hash>",
  "created_at": "2026-05-03T22:45:00-04:00",
  "run_date_et": "2026-05-03",
  "created_by": "evening-sweep",
  "project_id": "notion-page-id",
  "project_name": "Project name",
  "project_url": "https://www.notion.so/...",
  "goal_id": "optional-goal-id",
  "goal_name": "optional goal name",
  "job_type": "software-development",
  "safety_class": "pr_only",
  "input": {
    "description": "Written feature/problem description",
    "acceptance_criteria": [],
    "constraints": [],
    "target_repo_hint": null
  },
  "limits": {
    "max_runtime_minutes": 60,
    "max_tool_calls": 30
  },
  "allowed_side_effects": [
    "local_files",
    "git_branch",
    "git_push",
    "github_pr_open",
    "github_pr_update",
    "github_repo_create",
    "notion_log_append"
  ],
  "forbidden_side_effects": [
    "merge",
    "deploy",
    "publish_outside_pr",
    "send_external",
    "money",
    "delete"
  ],
  "status": "pending"
}

Result

result.json in each run directory:

{
  "job_id": "...",
  "status": "succeeded|failed|skipped|timeout|unsafe",
  "started_at": "...",
  "finished_at": "...",
  "summary": "Human-readable one paragraph summary",
  "review_needed": true,
  "outputs": {
    "pull_requests": [
      {
        "repo": "dndodson/argo",
        "url": "https://github.com/dndodson/argo/pull/123",
        "title": "feat: ...",
        "draft": true,
        "ci_status": "pending|success|failure|unknown"
      }
    ],
    "artifacts": [
      {
        "title": "...",
        "url": "http://srv1343021.tail8eb3a8.ts.net/apps/hermes-artifacts/..."
      }
    ]
  },
  "error": null
}

Queue writer

Implement overnight_workbench_queue.py with modes:

python ~/.hermes/scripts/overnight_workbench_queue.py --dry-run --pretty
python ~/.hermes/scripts/overnight_workbench_queue.py --append
python ~/.hermes/scripts/overnight_workbench_queue.py --append --from-evening-sweep
python ~/.hermes/scripts/overnight_workbench_queue.py --enqueue-json /path/to/job.json

Responsibilities:

  1. Import evening_sweep_context.build_payload().
  2. Read overnight_queue_candidates.
  3. Map each candidate to a job type:

- explicit software-development if the candidate is a feature/problem/build request

- research, draft, audit, review, or documentation otherwise

  1. Reject unsafe jobs before enqueue.
  2. Deduplicate by (run_date_et, project_id, action_hash, job_type).
  3. Enforce static caps at queue time:

- max 3 jobs/project/night

- max 10 jobs/night globally

  1. Emit [SILENT] if there is nothing to queue in cron-like mode.

Important: the queue writer should be deterministic. It may use heuristics, but it should not ask an LLM to decide whether to enqueue. The evening sweep LLM can suggest candidates; the queue writer enforces policy.

Worker

Implement overnight_workbench_worker.py with modes:

python ~/.hermes/scripts/overnight_workbench_worker.py --dry-run
python ~/.hermes/scripts/overnight_workbench_worker.py --once
python ~/.hermes/scripts/overnight_workbench_worker.py --run-date 2026-05-03
python ~/.hermes/scripts/overnight_workbench_worker.py --job-id <job_id>

Responsibilities:

  1. Read pending jobs from queue.jsonl.
  2. Select jobs for the current ET run date.
  3. Re-enforce caps:

- max 3 jobs/project/night

- max 10 jobs/night globally

  1. Claim each job atomically with a lock file under claims/.
  2. Write job.json and prompt.md into a run directory.
  3. Execute the job in a detached sibling process or user systemd scope.
  4. Capture stdout/stderr.
  5. Require result.json; missing result means failure.
  6. Append a concise Notion log only if enabled by job policy.

Worker execution strategy

For v1, use hermes chat -q one-shot workers instead of trying to embed all implementation logic in the scheduler.

Example command shape:

systemd-run --user --scope \
  --unit hermes-overnight-job-<job_id> \
  hermes chat \
    --provider openai-codex \
    --model gpt-5.5 \
    --toolsets terminal,file,web,session_search,skills \
    -q "<self-contained job prompt>"

The prompt should be self-contained and include:

Software-development job prompt contract

For software-development jobs, the worker prompt should require:

  1. Determine relevant repo:

- use target_repo_hint if provided

- otherwise search local repos and GitHub org/user repos

- if no relevant repo exists, create a new private repo under dndodson

  1. Inspect repo instructions:

- AGENTS.md

- project README

- existing tests and local quality gates

  1. Implement bounded change.
  2. Run local tests/lints targeted to the change.
  3. Commit changes.
  4. Push branch.
  5. Open one or more GitHub PRs.
  6. Write result.json with PR URLs and validation notes.

The prompt must explicitly forbid merge, deploy, public publish outside PR, money/trading, credential changes, and destructive operations.

Result context for morning brief

Implement overnight_workbench_context.py:

python ~/.hermes/scripts/overnight_workbench_context.py --pretty
python ~/.hermes/scripts/overnight_workbench_context.py --run-date 2026-05-03

Output JSON shape:

{
  "kind": "overnight_workbench",
  "run_date_et": "2026-05-03",
  "totals": {
    "queued": 5,
    "claimed": 5,
    "succeeded": 3,
    "failed": 1,
    "skipped": 1,
    "prs_opened": 4,
    "review_needed": 4
  },
  "results": [
    {
      "project_name": "...",
      "job_type": "software-development",
      "status": "succeeded",
      "summary": "...",
      "pull_requests": ["https://github.com/.../pull/123"],
      "artifacts": [],
      "review_needed": true
    }
  ],
  "failures": []
}

Then modify morning_brief_context.py to include:

from overnight_workbench_context import build_payload as build_workbench_payload
...
"overnight_workbench": build_workbench_payload(...)

Keep the morning-brief script JSON-only. The cron prompt formats the Telegram summary.

Evening sweep integration

Modify evening_sweep_context.py in two phases:

Phase 1: consume existing candidates

Leave overnight_queue_candidates mostly as-is, then queue writer converts them.

Phase 2: enrich candidate metadata

Add fields:

{
  "candidate_reason": "safe_verb=implement|explicit_context=overnight-ok",
  "suggested_job_type": "software-development",
  "safety_class": "pr_only",
  "target_repo_hint": "dndodson/argo",
  "acceptance_criteria": []
}

Also fetch Contexts in fetch_active_projects() so overnight-ok can become an explicit opt-in tag rather than only a future hook.

Scheduling

Prefer user systemd for the worker because this is execution infrastructure with logs, not a user-facing digest.

Service

~/.config/systemd/user/hermes-overnight-workbench.service

[Unit]
Description=Run Hermes overnight workbench
After=network-online.target
Wants=network-online.target

[Service]
Type=oneshot
WorkingDirectory=/home/dndodson
ExecStart=/home/dndodson/.hermes/hermes-agent/venv/bin/python3 /home/dndodson/.hermes/scripts/overnight_workbench_worker.py --once

Timer

~/.config/systemd/user/hermes-overnight-workbench.timer

[Unit]
Description=Run Hermes overnight workbench nightly

[Timer]
OnCalendar=*-*-* 23:15:00 America/New_York
Persistent=false
Unit=hermes-overnight-workbench.service

[Install]
WantedBy=timers.target

Verification:

systemd-analyze calendar '*-*-* 23:15:00 America/New_York'
systemctl --user daemon-reload
systemctl --user enable hermes-overnight-workbench.timer
systemctl --user start hermes-overnight-workbench.timer
systemctl --user start hermes-overnight-workbench.service
systemctl --user status --no-pager --full hermes-overnight-workbench.service
journalctl --user -u hermes-overnight-workbench.service -n 50 --no-pager

Tests

Use deterministic unit tests for policy and file behavior.

Queue tests

Worker tests

Context tests

Rollout plan

PR 1 — Queue + context plumbing

Implement:

No worker execution yet. This PR should prove the data path from evening candidates to morning summary.

PR 2 — Dry-run worker

Implement:

No real Hermes subprocess execution yet.

PR 3 — Real worker execution for non-code job types

Implement:

Run with small hand-authored test jobs first.

PR 4 — Software-development job type

Implement:

This is where the workbench becomes materially useful for Argo and Hermes self-improvement.

PR 5 — Enable nightly timer

Implement:

Operational guardrails

First implementation command sequence

Start with PR 1:

cd /home/dndodson/.hermes/scripts
# create tests first
# implement queue/context scripts
python -m pytest tests/test_overnight_workbench_queue.py tests/test_overnight_workbench_context.py -q
python overnight_workbench_queue.py --dry-run --pretty
python overnight_workbench_context.py --pretty
python morning_brief_context.py --pretty

Then open a PR against the relevant Hermes/config repository if this code is tracked there. If these scripts remain local-only, publish the implementation artifact and Notion log until we decide where operational scripts should live long-term.

Open implementation decision

The only significant architecture decision left before PR 1 is where to keep the code under version control:

  1. Keep scripts in ~/.hermes/scripts and create a private dndodson/hermes-ops repo for backup/review.
  2. Move scripts into a tracked Hermes Agent fork/profile repo if they should travel with Hermes itself.
  3. Keep them local for v1, then migrate after the pilot.

Recommendation: create a private dndodson/hermes-ops repo for operational automation scripts. It matches the artifact/project nature of Stage 3, makes PR review natural, and avoids mixing Daniel-specific automation into upstream Hermes Agent code.