Hermes Autonomy Stage 3 — Implementation Plan

Generated: 2026-05-03T17:58:30+00:00

Project: Hermes Autonomy Metaproject

Goal

Implement the Stage 3 overnight workbench so Hermes can turn evening-sweep candidates into bounded overnight jobs, execute safe work unattended, and surface reviewable outputs in the next morning brief.

Stage 3 is not just another digest. It is the first production loop where Hermes does useful work without Daniel prompting each task:

Evening sweep discovers or proposes work.
Queue writer converts eligible work into durable jobs.
Worker executes up to 10 jobs/night, at most 3 per project.
Software-development jobs open PRs as the primary review artifact.
Morning brief reports what ran, what opened PRs, what failed, and what needs Daniel review.

Current baseline

Already present:

~/.hermes/scripts/evening_sweep_context.py

- emits overnight_queue_candidates

- currently uses conservative next-action heuristics

~/.hermes/scripts/morning_brief_context.py

- emits goal-aware morning brief JSON

- currently lacks an overnight_workbench section

Existing cron jobs:

- evening-sweep at 22:33 ET, delivered to Telegram General

- morning-brief at 07:57 ET, delivered to Telegram General

Existing web artifact:

- Stage 3 design at the Hermes Artifacts web app

Missing:

queue file/writer
worker
result summarizer
morning brief integration
user systemd worker timer
tests and dry-run fixtures

Implementation shape

Use local JSONL as the queue substrate and local JSON result files as the execution ledger.

~/.hermes/state/overnight_workbench/
  queue.jsonl
  claims/
    <job_id>.lock
  runs/
    YYYY-MM-DD/
      <job_id>/
        job.json
        result.json
        prompt.md
        stdout.log
        stderr.log
        summary.md

Scripts:

~/.hermes/scripts/overnight_workbench_queue.py
~/.hermes/scripts/overnight_workbench_worker.py
~/.hermes/scripts/overnight_workbench_context.py
~/.hermes/scripts/tests/test_overnight_workbench_queue.py
~/.hermes/scripts/tests/test_overnight_workbench_worker.py
~/.hermes/scripts/tests/test_overnight_workbench_context.py

Systemd user units:

~/.config/systemd/user/hermes-overnight-workbench.service
~/.config/systemd/user/hermes-overnight-workbench.timer

Data model

Queue job

One JSON object per line in queue.jsonl:

{
  "job_id": "2026-05-03_overnight_<short_hash>",
  "created_at": "2026-05-03T22:45:00-04:00",
  "run_date_et": "2026-05-03",
  "created_by": "evening-sweep",
  "project_id": "notion-page-id",
  "project_name": "Project name",
  "project_url": "https://www.notion.so/...",
  "goal_id": "optional-goal-id",
  "goal_name": "optional goal name",
  "job_type": "software-development",
  "safety_class": "pr_only",
  "input": {
    "description": "Written feature/problem description",
    "acceptance_criteria": [],
    "constraints": [],
    "target_repo_hint": null
  },
  "limits": {
    "max_runtime_minutes": 60,
    "max_tool_calls": 30
  },
  "allowed_side_effects": [
    "local_files",
    "git_branch",
    "git_push",
    "github_pr_open",
    "github_pr_update",
    "github_repo_create",
    "notion_log_append"
  ],
  "forbidden_side_effects": [
    "merge",
    "deploy",
    "publish_outside_pr",
    "send_external",
    "money",
    "delete"
  ],
  "status": "pending"
}

Result

result.json in each run directory:

{
  "job_id": "...",
  "status": "succeeded|failed|skipped|timeout|unsafe",
  "started_at": "...",
  "finished_at": "...",
  "summary": "Human-readable one paragraph summary",
  "review_needed": true,
  "outputs": {
    "pull_requests": [
      {
        "repo": "dndodson/argo",
        "url": "https://github.com/dndodson/argo/pull/123",
        "title": "feat: ...",
        "draft": true,
        "ci_status": "pending|success|failure|unknown"
      }
    ],
    "artifacts": [
      {
        "title": "...",
        "url": "http://srv1343021.tail8eb3a8.ts.net/apps/hermes-artifacts/..."
      }
    ]
  },
  "error": null
}

Queue writer

Implement overnight_workbench_queue.py with modes:

python ~/.hermes/scripts/overnight_workbench_queue.py --dry-run --pretty
python ~/.hermes/scripts/overnight_workbench_queue.py --append
python ~/.hermes/scripts/overnight_workbench_queue.py --append --from-evening-sweep
python ~/.hermes/scripts/overnight_workbench_queue.py --enqueue-json /path/to/job.json

Responsibilities:

Import evening_sweep_context.build_payload().
Read overnight_queue_candidates.
Map each candidate to a job type:

- explicit software-development if the candidate is a feature/problem/build request

- research, draft, audit, review, or documentation otherwise

Reject unsafe jobs before enqueue.
Deduplicate by (run_date_et, project_id, action_hash, job_type).
Enforce static caps at queue time:

- max 3 jobs/project/night

- max 10 jobs/night globally

Emit [SILENT] if there is nothing to queue in cron-like mode.

Important: the queue writer should be deterministic. It may use heuristics, but it should not ask an LLM to decide whether to enqueue. The evening sweep LLM can suggest candidates; the queue writer enforces policy.

Worker

Implement overnight_workbench_worker.py with modes:

python ~/.hermes/scripts/overnight_workbench_worker.py --dry-run
python ~/.hermes/scripts/overnight_workbench_worker.py --once
python ~/.hermes/scripts/overnight_workbench_worker.py --run-date 2026-05-03
python ~/.hermes/scripts/overnight_workbench_worker.py --job-id <job_id>

Responsibilities:

Read pending jobs from queue.jsonl.
Select jobs for the current ET run date.
Re-enforce caps:

- max 3 jobs/project/night

- max 10 jobs/night globally

Claim each job atomically with a lock file under claims/.
Write job.json and prompt.md into a run directory.
Execute the job in a detached sibling process or user systemd scope.
Capture stdout/stderr.
Require result.json; missing result means failure.
Append a concise Notion log only if enabled by job policy.

Worker execution strategy

For v1, use hermes chat -q one-shot workers instead of trying to embed all implementation logic in the scheduler.

Example command shape:

systemd-run --user --scope \
  --unit hermes-overnight-job-<job_id> \
  hermes chat \
    --provider openai-codex \
    --model gpt-5.5 \
    --toolsets terminal,file,web,session_search,skills \
    -q "<self-contained job prompt>"

The prompt should be self-contained and include:

project context
job input description
safety envelope
explicit allowed/forbidden side effects
output contract requiring result.json
PR policy: open PRs encouraged/expected for software-development, never merge
artifact-link rule: web/Notion links, never raw file paths in final handoff

Software-development job prompt contract

For software-development jobs, the worker prompt should require:

Determine relevant repo:

- use target_repo_hint if provided

- otherwise search local repos and GitHub org/user repos

- if no relevant repo exists, create a new private repo under dndodson

Inspect repo instructions:

- AGENTS.md

- project README

- existing tests and local quality gates

Implement bounded change.
Run local tests/lints targeted to the change.
Commit changes.
Push branch.
Open one or more GitHub PRs.
Write result.json with PR URLs and validation notes.

The prompt must explicitly forbid merge, deploy, public publish outside PR, money/trading, credential changes, and destructive operations.

Result context for morning brief

Implement overnight_workbench_context.py:

python ~/.hermes/scripts/overnight_workbench_context.py --pretty
python ~/.hermes/scripts/overnight_workbench_context.py --run-date 2026-05-03

Output JSON shape:

{
  "kind": "overnight_workbench",
  "run_date_et": "2026-05-03",
  "totals": {
    "queued": 5,
    "claimed": 5,
    "succeeded": 3,
    "failed": 1,
    "skipped": 1,
    "prs_opened": 4,
    "review_needed": 4
  },
  "results": [
    {
      "project_name": "...",
      "job_type": "software-development",
      "status": "succeeded",
      "summary": "...",
      "pull_requests": ["https://github.com/.../pull/123"],
      "artifacts": [],
      "review_needed": true
    }
  ],
  "failures": []
}

Then modify morning_brief_context.py to include:

from overnight_workbench_context import build_payload as build_workbench_payload
...
"overnight_workbench": build_workbench_payload(...)

Keep the morning-brief script JSON-only. The cron prompt formats the Telegram summary.

Evening sweep integration

Modify evening_sweep_context.py in two phases:

Phase 1: consume existing candidates

Leave overnight_queue_candidates mostly as-is, then queue writer converts them.

Phase 2: enrich candidate metadata

Add fields:

{
  "candidate_reason": "safe_verb=implement|explicit_context=overnight-ok",
  "suggested_job_type": "software-development",
  "safety_class": "pr_only",
  "target_repo_hint": "dndodson/argo",
  "acceptance_criteria": []
}

Also fetch Contexts in fetch_active_projects() so overnight-ok can become an explicit opt-in tag rather than only a future hook.

Scheduling

Prefer user systemd for the worker because this is execution infrastructure with logs, not a user-facing digest.

Service

~/.config/systemd/user/hermes-overnight-workbench.service

[Unit]
Description=Run Hermes overnight workbench
After=network-online.target
Wants=network-online.target

[Service]
Type=oneshot
WorkingDirectory=/home/dndodson
ExecStart=/home/dndodson/.hermes/hermes-agent/venv/bin/python3 /home/dndodson/.hermes/scripts/overnight_workbench_worker.py --once

Timer

~/.config/systemd/user/hermes-overnight-workbench.timer

[Unit]
Description=Run Hermes overnight workbench nightly

[Timer]
OnCalendar=*-*-* 23:15:00 America/New_York
Persistent=false
Unit=hermes-overnight-workbench.service

[Install]
WantedBy=timers.target

Verification:

systemd-analyze calendar '*-*-* 23:15:00 America/New_York'
systemctl --user daemon-reload
systemctl --user enable hermes-overnight-workbench.timer
systemctl --user start hermes-overnight-workbench.timer
systemctl --user start hermes-overnight-workbench.service
systemctl --user status --no-pager --full hermes-overnight-workbench.service
journalctl --user -u hermes-overnight-workbench.service -n 50 --no-pager

Tests

Use deterministic unit tests for policy and file behavior.

Queue tests

dry-run emits jobs without writing queue
append writes one JSON object per line
dedupe prevents duplicate same project/action/date
max 3 jobs/project/night enforced
max 10 jobs/night enforced
unsafe markers rejected
software-development candidate maps to pr_only

Worker tests

claims are atomic
stale claims are detected
dry-run writes no external side effects
unsafe job writes skipped_unsafe
missing result writes failure
software-development prompt contains PR-required policy and merge-forbidden policy

Context tests

summarizes successes/failures/skips
extracts PR URLs
handles missing/malformed result files gracefully
morning brief includes overnight_workbench even when empty

Rollout plan

PR 1 — Queue + context plumbing

Implement:

overnight_workbench_queue.py
overnight_workbench_context.py
morning brief integration
tests

No worker execution yet. This PR should prove the data path from evening candidates to morning summary.

PR 2 — Dry-run worker

Implement:

worker selection/caps/claims
dry-run execution
synthetic result writing
systemd unit templates, not enabled by default
tests

No real Hermes subprocess execution yet.

PR 3 — Real worker execution for non-code job types

Implement:

hermes chat -q execution
result contract
research/draft/audit/review/documentation prompts
Notion log append behind explicit flag

Run with small hand-authored test jobs first.

PR 4 — Software-development job type

Implement:

repo resolution
private repo creation fallback
PR-opening contract
local quality gate prompt requirements
result parser for PR URLs

This is where the workbench becomes materially useful for Argo and Hermes self-improvement.

PR 5 — Enable nightly timer

Implement:

install/enable systemd timer
manual dry-run verification
one-night monitored pilot
morning brief confirmation

Operational guardrails

No merges.
No production deploys or service restarts.
No trading/money actions.
No credential changes.
No destructive file or data operations outside explicit workspace/repo boundaries.
Every output must be readable from a GitHub PR, web artifact, or Notion link.
Fail closed: if policy is ambiguous, mark job skipped_unsafe with a reason.

First implementation command sequence

Start with PR 1:

cd /home/dndodson/.hermes/scripts
# create tests first
# implement queue/context scripts
python -m pytest tests/test_overnight_workbench_queue.py tests/test_overnight_workbench_context.py -q
python overnight_workbench_queue.py --dry-run --pretty
python overnight_workbench_context.py --pretty
python morning_brief_context.py --pretty

Then open a PR against the relevant Hermes/config repository if this code is tracked there. If these scripts remain local-only, publish the implementation artifact and Notion log until we decide where operational scripts should live long-term.

Open implementation decision

The only significant architecture decision left before PR 1 is where to keep the code under version control:

Keep scripts in ~/.hermes/scripts and create a private dndodson/hermes-ops repo for backup/review.
Move scripts into a tracked Hermes Agent fork/profile repo if they should travel with Hermes itself.
Keep them local for v1, then migrate after the pilot.

Recommendation: create a private dndodson/hermes-ops repo for operational automation scripts. It matches the artifact/project nature of Stage 3, makes PR review natural, and avoids mixing Daniel-specific automation into upstream Hermes Agent code.