Hermes Autonomy Stage 3 — Implementation Plan
Generated: 2026-05-03T17:58:30+00:00
Project: Hermes Autonomy Metaproject
Goal
Implement the Stage 3 overnight workbench so Hermes can turn evening-sweep candidates into bounded overnight jobs, execute safe work unattended, and surface reviewable outputs in the next morning brief.
Stage 3 is not just another digest. It is the first production loop where Hermes does useful work without Daniel prompting each task:
- Evening sweep discovers or proposes work.
- Queue writer converts eligible work into durable jobs.
- Worker executes up to 10 jobs/night, at most 3 per project.
- Software-development jobs open PRs as the primary review artifact.
- Morning brief reports what ran, what opened PRs, what failed, and what needs Daniel review.
Current baseline
Already present:
~/.hermes/scripts/evening_sweep_context.py
- emits overnight_queue_candidates
- currently uses conservative next-action heuristics
~/.hermes/scripts/morning_brief_context.py
- emits goal-aware morning brief JSON
- currently lacks an overnight_workbench section
- Existing cron jobs:
- evening-sweep at 22:33 ET, delivered to Telegram General
- morning-brief at 07:57 ET, delivered to Telegram General
- Existing web artifact:
- Stage 3 design at the Hermes Artifacts web app
Missing:
- queue file/writer
- worker
- result summarizer
- morning brief integration
- user systemd worker timer
- tests and dry-run fixtures
Implementation shape
Use local JSONL as the queue substrate and local JSON result files as the execution ledger.
~/.hermes/state/overnight_workbench/
queue.jsonl
claims/
<job_id>.lock
runs/
YYYY-MM-DD/
<job_id>/
job.json
result.json
prompt.md
stdout.log
stderr.log
summary.md
Scripts:
~/.hermes/scripts/overnight_workbench_queue.py
~/.hermes/scripts/overnight_workbench_worker.py
~/.hermes/scripts/overnight_workbench_context.py
~/.hermes/scripts/tests/test_overnight_workbench_queue.py
~/.hermes/scripts/tests/test_overnight_workbench_worker.py
~/.hermes/scripts/tests/test_overnight_workbench_context.py
Systemd user units:
~/.config/systemd/user/hermes-overnight-workbench.service
~/.config/systemd/user/hermes-overnight-workbench.timer
Data model
Queue job
One JSON object per line in queue.jsonl:
{
"job_id": "2026-05-03_overnight_<short_hash>",
"created_at": "2026-05-03T22:45:00-04:00",
"run_date_et": "2026-05-03",
"created_by": "evening-sweep",
"project_id": "notion-page-id",
"project_name": "Project name",
"project_url": "https://www.notion.so/...",
"goal_id": "optional-goal-id",
"goal_name": "optional goal name",
"job_type": "software-development",
"safety_class": "pr_only",
"input": {
"description": "Written feature/problem description",
"acceptance_criteria": [],
"constraints": [],
"target_repo_hint": null
},
"limits": {
"max_runtime_minutes": 60,
"max_tool_calls": 30
},
"allowed_side_effects": [
"local_files",
"git_branch",
"git_push",
"github_pr_open",
"github_pr_update",
"github_repo_create",
"notion_log_append"
],
"forbidden_side_effects": [
"merge",
"deploy",
"publish_outside_pr",
"send_external",
"money",
"delete"
],
"status": "pending"
}
Result
result.json in each run directory:
{
"job_id": "...",
"status": "succeeded|failed|skipped|timeout|unsafe",
"started_at": "...",
"finished_at": "...",
"summary": "Human-readable one paragraph summary",
"review_needed": true,
"outputs": {
"pull_requests": [
{
"repo": "dndodson/argo",
"url": "https://github.com/dndodson/argo/pull/123",
"title": "feat: ...",
"draft": true,
"ci_status": "pending|success|failure|unknown"
}
],
"artifacts": [
{
"title": "...",
"url": "http://srv1343021.tail8eb3a8.ts.net/apps/hermes-artifacts/..."
}
]
},
"error": null
}
Queue writer
Implement overnight_workbench_queue.py with modes:
python ~/.hermes/scripts/overnight_workbench_queue.py --dry-run --pretty
python ~/.hermes/scripts/overnight_workbench_queue.py --append
python ~/.hermes/scripts/overnight_workbench_queue.py --append --from-evening-sweep
python ~/.hermes/scripts/overnight_workbench_queue.py --enqueue-json /path/to/job.json
Responsibilities:
- Import
evening_sweep_context.build_payload(). - Read
overnight_queue_candidates. - Map each candidate to a job type:
- explicit software-development if the candidate is a feature/problem/build request
- research, draft, audit, review, or documentation otherwise
- Reject unsafe jobs before enqueue.
- Deduplicate by
(run_date_et, project_id, action_hash, job_type). - Enforce static caps at queue time:
- max 3 jobs/project/night
- max 10 jobs/night globally
- Emit
[SILENT]if there is nothing to queue in cron-like mode.
Important: the queue writer should be deterministic. It may use heuristics, but it should not ask an LLM to decide whether to enqueue. The evening sweep LLM can suggest candidates; the queue writer enforces policy.
Worker
Implement overnight_workbench_worker.py with modes:
python ~/.hermes/scripts/overnight_workbench_worker.py --dry-run
python ~/.hermes/scripts/overnight_workbench_worker.py --once
python ~/.hermes/scripts/overnight_workbench_worker.py --run-date 2026-05-03
python ~/.hermes/scripts/overnight_workbench_worker.py --job-id <job_id>
Responsibilities:
- Read pending jobs from
queue.jsonl. - Select jobs for the current ET run date.
- Re-enforce caps:
- max 3 jobs/project/night
- max 10 jobs/night globally
- Claim each job atomically with a lock file under
claims/. - Write
job.jsonandprompt.mdinto a run directory. - Execute the job in a detached sibling process or user systemd scope.
- Capture stdout/stderr.
- Require
result.json; missing result means failure. - Append a concise Notion log only if enabled by job policy.
Worker execution strategy
For v1, use hermes chat -q one-shot workers instead of trying to embed all implementation logic in the scheduler.
Example command shape:
systemd-run --user --scope \
--unit hermes-overnight-job-<job_id> \
hermes chat \
--provider openai-codex \
--model gpt-5.5 \
--toolsets terminal,file,web,session_search,skills \
-q "<self-contained job prompt>"
The prompt should be self-contained and include:
- project context
- job input description
- safety envelope
- explicit allowed/forbidden side effects
- output contract requiring
result.json - PR policy: open PRs encouraged/expected for software-development, never merge
- artifact-link rule: web/Notion links, never raw file paths in final handoff
Software-development job prompt contract
For software-development jobs, the worker prompt should require:
- Determine relevant repo:
- use target_repo_hint if provided
- otherwise search local repos and GitHub org/user repos
- if no relevant repo exists, create a new private repo under dndodson
- Inspect repo instructions:
- AGENTS.md
- project README
- existing tests and local quality gates
- Implement bounded change.
- Run local tests/lints targeted to the change.
- Commit changes.
- Push branch.
- Open one or more GitHub PRs.
- Write
result.jsonwith PR URLs and validation notes.
The prompt must explicitly forbid merge, deploy, public publish outside PR, money/trading, credential changes, and destructive operations.
Result context for morning brief
Implement overnight_workbench_context.py:
python ~/.hermes/scripts/overnight_workbench_context.py --pretty
python ~/.hermes/scripts/overnight_workbench_context.py --run-date 2026-05-03
Output JSON shape:
{
"kind": "overnight_workbench",
"run_date_et": "2026-05-03",
"totals": {
"queued": 5,
"claimed": 5,
"succeeded": 3,
"failed": 1,
"skipped": 1,
"prs_opened": 4,
"review_needed": 4
},
"results": [
{
"project_name": "...",
"job_type": "software-development",
"status": "succeeded",
"summary": "...",
"pull_requests": ["https://github.com/.../pull/123"],
"artifacts": [],
"review_needed": true
}
],
"failures": []
}
Then modify morning_brief_context.py to include:
from overnight_workbench_context import build_payload as build_workbench_payload
...
"overnight_workbench": build_workbench_payload(...)
Keep the morning-brief script JSON-only. The cron prompt formats the Telegram summary.
Evening sweep integration
Modify evening_sweep_context.py in two phases:
Phase 1: consume existing candidates
Leave overnight_queue_candidates mostly as-is, then queue writer converts them.
Phase 2: enrich candidate metadata
Add fields:
{
"candidate_reason": "safe_verb=implement|explicit_context=overnight-ok",
"suggested_job_type": "software-development",
"safety_class": "pr_only",
"target_repo_hint": "dndodson/argo",
"acceptance_criteria": []
}
Also fetch Contexts in fetch_active_projects() so overnight-ok can become an explicit opt-in tag rather than only a future hook.
Scheduling
Prefer user systemd for the worker because this is execution infrastructure with logs, not a user-facing digest.
Service
~/.config/systemd/user/hermes-overnight-workbench.service
[Unit]
Description=Run Hermes overnight workbench
After=network-online.target
Wants=network-online.target
[Service]
Type=oneshot
WorkingDirectory=/home/dndodson
ExecStart=/home/dndodson/.hermes/hermes-agent/venv/bin/python3 /home/dndodson/.hermes/scripts/overnight_workbench_worker.py --once
Timer
~/.config/systemd/user/hermes-overnight-workbench.timer
[Unit]
Description=Run Hermes overnight workbench nightly
[Timer]
OnCalendar=*-*-* 23:15:00 America/New_York
Persistent=false
Unit=hermes-overnight-workbench.service
[Install]
WantedBy=timers.target
Verification:
systemd-analyze calendar '*-*-* 23:15:00 America/New_York'
systemctl --user daemon-reload
systemctl --user enable hermes-overnight-workbench.timer
systemctl --user start hermes-overnight-workbench.timer
systemctl --user start hermes-overnight-workbench.service
systemctl --user status --no-pager --full hermes-overnight-workbench.service
journalctl --user -u hermes-overnight-workbench.service -n 50 --no-pager
Tests
Use deterministic unit tests for policy and file behavior.
Queue tests
- dry-run emits jobs without writing queue
- append writes one JSON object per line
- dedupe prevents duplicate same project/action/date
- max 3 jobs/project/night enforced
- max 10 jobs/night enforced
- unsafe markers rejected
- software-development candidate maps to
pr_only
Worker tests
- claims are atomic
- stale claims are detected
- dry-run writes no external side effects
- unsafe job writes
skipped_unsafe - missing result writes failure
- software-development prompt contains PR-required policy and merge-forbidden policy
Context tests
- summarizes successes/failures/skips
- extracts PR URLs
- handles missing/malformed result files gracefully
- morning brief includes
overnight_workbencheven when empty
Rollout plan
PR 1 — Queue + context plumbing
Implement:
overnight_workbench_queue.pyovernight_workbench_context.py- morning brief integration
- tests
No worker execution yet. This PR should prove the data path from evening candidates to morning summary.
PR 2 — Dry-run worker
Implement:
- worker selection/caps/claims
- dry-run execution
- synthetic result writing
- systemd unit templates, not enabled by default
- tests
No real Hermes subprocess execution yet.
PR 3 — Real worker execution for non-code job types
Implement:
hermes chat -qexecution- result contract
- research/draft/audit/review/documentation prompts
- Notion log append behind explicit flag
Run with small hand-authored test jobs first.
PR 4 — Software-development job type
Implement:
- repo resolution
- private repo creation fallback
- PR-opening contract
- local quality gate prompt requirements
- result parser for PR URLs
This is where the workbench becomes materially useful for Argo and Hermes self-improvement.
PR 5 — Enable nightly timer
Implement:
- install/enable systemd timer
- manual dry-run verification
- one-night monitored pilot
- morning brief confirmation
Operational guardrails
- No merges.
- No production deploys or service restarts.
- No trading/money actions.
- No credential changes.
- No destructive file or data operations outside explicit workspace/repo boundaries.
- Every output must be readable from a GitHub PR, web artifact, or Notion link.
- Fail closed: if policy is ambiguous, mark job
skipped_unsafewith a reason.
First implementation command sequence
Start with PR 1:
cd /home/dndodson/.hermes/scripts
# create tests first
# implement queue/context scripts
python -m pytest tests/test_overnight_workbench_queue.py tests/test_overnight_workbench_context.py -q
python overnight_workbench_queue.py --dry-run --pretty
python overnight_workbench_context.py --pretty
python morning_brief_context.py --pretty
Then open a PR against the relevant Hermes/config repository if this code is tracked there. If these scripts remain local-only, publish the implementation artifact and Notion log until we decide where operational scripts should live long-term.
Open implementation decision
The only significant architecture decision left before PR 1 is where to keep the code under version control:
- Keep scripts in
~/.hermes/scriptsand create a privatedndodson/hermes-opsrepo for backup/review. - Move scripts into a tracked Hermes Agent fork/profile repo if they should travel with Hermes itself.
- Keep them local for v1, then migrate after the pilot.
Recommendation: create a private dndodson/hermes-ops repo for operational automation scripts. It matches the artifact/project nature of Stage 3, makes PR review natural, and avoids mixing Daniel-specific automation into upstream Hermes Agent code.