# Hermes Autonomy Stage 3 — Implementation Plan

Generated: 2026-05-03T17:58:30+00:00
Project: Hermes Autonomy Metaproject

## Goal

Implement the Stage 3 overnight workbench so Hermes can turn evening-sweep candidates into bounded overnight jobs, execute safe work unattended, and surface reviewable outputs in the next morning brief.

Stage 3 is not just another digest. It is the first production loop where Hermes does useful work without Daniel prompting each task:

1. Evening sweep discovers or proposes work.
2. Queue writer converts eligible work into durable jobs.
3. Worker executes up to 10 jobs/night, at most 3 per project.
4. Software-development jobs open PRs as the primary review artifact.
5. Morning brief reports what ran, what opened PRs, what failed, and what needs Daniel review.

## Current baseline

Already present:

- `~/.hermes/scripts/evening_sweep_context.py`
  - emits `overnight_queue_candidates`
  - currently uses conservative next-action heuristics
- `~/.hermes/scripts/morning_brief_context.py`
  - emits goal-aware morning brief JSON
  - currently lacks an `overnight_workbench` section
- Existing cron jobs:
  - `evening-sweep` at 22:33 ET, delivered to Telegram General
  - `morning-brief` at 07:57 ET, delivered to Telegram General
- Existing web artifact:
  - Stage 3 design at the Hermes Artifacts web app

Missing:

- queue file/writer
- worker
- result summarizer
- morning brief integration
- user systemd worker timer
- tests and dry-run fixtures

## Implementation shape

Use local JSONL as the queue substrate and local JSON result files as the execution ledger.

```text
~/.hermes/state/overnight_workbench/
  queue.jsonl
  claims/
    <job_id>.lock
  runs/
    YYYY-MM-DD/
      <job_id>/
        job.json
        result.json
        prompt.md
        stdout.log
        stderr.log
        summary.md
```

Scripts:

```text
~/.hermes/scripts/overnight_workbench_queue.py
~/.hermes/scripts/overnight_workbench_worker.py
~/.hermes/scripts/overnight_workbench_context.py
~/.hermes/scripts/tests/test_overnight_workbench_queue.py
~/.hermes/scripts/tests/test_overnight_workbench_worker.py
~/.hermes/scripts/tests/test_overnight_workbench_context.py
```

Systemd user units:

```text
~/.config/systemd/user/hermes-overnight-workbench.service
~/.config/systemd/user/hermes-overnight-workbench.timer
```

## Data model

### Queue job

One JSON object per line in `queue.jsonl`:

```json
{
  "job_id": "2026-05-03_overnight_<short_hash>",
  "created_at": "2026-05-03T22:45:00-04:00",
  "run_date_et": "2026-05-03",
  "created_by": "evening-sweep",
  "project_id": "notion-page-id",
  "project_name": "Project name",
  "project_url": "https://www.notion.so/...",
  "goal_id": "optional-goal-id",
  "goal_name": "optional goal name",
  "job_type": "software-development",
  "safety_class": "pr_only",
  "input": {
    "description": "Written feature/problem description",
    "acceptance_criteria": [],
    "constraints": [],
    "target_repo_hint": null
  },
  "limits": {
    "max_runtime_minutes": 60,
    "max_tool_calls": 30
  },
  "allowed_side_effects": [
    "local_files",
    "git_branch",
    "git_push",
    "github_pr_open",
    "github_pr_update",
    "github_repo_create",
    "notion_log_append"
  ],
  "forbidden_side_effects": [
    "merge",
    "deploy",
    "publish_outside_pr",
    "send_external",
    "money",
    "delete"
  ],
  "status": "pending"
}
```

### Result

`result.json` in each run directory:

```json
{
  "job_id": "...",
  "status": "succeeded|failed|skipped|timeout|unsafe",
  "started_at": "...",
  "finished_at": "...",
  "summary": "Human-readable one paragraph summary",
  "review_needed": true,
  "outputs": {
    "pull_requests": [
      {
        "repo": "dndodson/argo",
        "url": "https://github.com/dndodson/argo/pull/123",
        "title": "feat: ...",
        "draft": true,
        "ci_status": "pending|success|failure|unknown"
      }
    ],
    "artifacts": [
      {
        "title": "...",
        "url": "http://srv1343021.tail8eb3a8.ts.net/apps/hermes-artifacts/..."
      }
    ]
  },
  "error": null
}
```

## Queue writer

Implement `overnight_workbench_queue.py` with modes:

```bash
python ~/.hermes/scripts/overnight_workbench_queue.py --dry-run --pretty
python ~/.hermes/scripts/overnight_workbench_queue.py --append
python ~/.hermes/scripts/overnight_workbench_queue.py --append --from-evening-sweep
python ~/.hermes/scripts/overnight_workbench_queue.py --enqueue-json /path/to/job.json
```

Responsibilities:

1. Import `evening_sweep_context.build_payload()`.
2. Read `overnight_queue_candidates`.
3. Map each candidate to a job type:
   - explicit `software-development` if the candidate is a feature/problem/build request
   - `research`, `draft`, `audit`, `review`, or `documentation` otherwise
4. Reject unsafe jobs before enqueue.
5. Deduplicate by `(run_date_et, project_id, action_hash, job_type)`.
6. Enforce static caps at queue time:
   - max 3 jobs/project/night
   - max 10 jobs/night globally
7. Emit `[SILENT]` if there is nothing to queue in cron-like mode.

Important: the queue writer should be deterministic. It may use heuristics, but it should not ask an LLM to decide whether to enqueue. The evening sweep LLM can suggest candidates; the queue writer enforces policy.

## Worker

Implement `overnight_workbench_worker.py` with modes:

```bash
python ~/.hermes/scripts/overnight_workbench_worker.py --dry-run
python ~/.hermes/scripts/overnight_workbench_worker.py --once
python ~/.hermes/scripts/overnight_workbench_worker.py --run-date 2026-05-03
python ~/.hermes/scripts/overnight_workbench_worker.py --job-id <job_id>
```

Responsibilities:

1. Read pending jobs from `queue.jsonl`.
2. Select jobs for the current ET run date.
3. Re-enforce caps:
   - max 3 jobs/project/night
   - max 10 jobs/night globally
4. Claim each job atomically with a lock file under `claims/`.
5. Write `job.json` and `prompt.md` into a run directory.
6. Execute the job in a detached sibling process or user systemd scope.
7. Capture stdout/stderr.
8. Require `result.json`; missing result means failure.
9. Append a concise Notion log only if enabled by job policy.

### Worker execution strategy

For v1, use `hermes chat -q` one-shot workers instead of trying to embed all implementation logic in the scheduler.

Example command shape:

```bash
systemd-run --user --scope \
  --unit hermes-overnight-job-<job_id> \
  hermes chat \
    --provider openai-codex \
    --model gpt-5.5 \
    --toolsets terminal,file,web,session_search,skills \
    -q "<self-contained job prompt>"
```

The prompt should be self-contained and include:

- project context
- job input description
- safety envelope
- explicit allowed/forbidden side effects
- output contract requiring `result.json`
- PR policy: open PRs encouraged/expected for software-development, never merge
- artifact-link rule: web/Notion links, never raw file paths in final handoff

### Software-development job prompt contract

For `software-development` jobs, the worker prompt should require:

1. Determine relevant repo:
   - use `target_repo_hint` if provided
   - otherwise search local repos and GitHub org/user repos
   - if no relevant repo exists, create a new private repo under `dndodson`
2. Inspect repo instructions:
   - `AGENTS.md`
   - project README
   - existing tests and local quality gates
3. Implement bounded change.
4. Run local tests/lints targeted to the change.
5. Commit changes.
6. Push branch.
7. Open one or more GitHub PRs.
8. Write `result.json` with PR URLs and validation notes.

The prompt must explicitly forbid merge, deploy, public publish outside PR, money/trading, credential changes, and destructive operations.

## Result context for morning brief

Implement `overnight_workbench_context.py`:

```bash
python ~/.hermes/scripts/overnight_workbench_context.py --pretty
python ~/.hermes/scripts/overnight_workbench_context.py --run-date 2026-05-03
```

Output JSON shape:

```json
{
  "kind": "overnight_workbench",
  "run_date_et": "2026-05-03",
  "totals": {
    "queued": 5,
    "claimed": 5,
    "succeeded": 3,
    "failed": 1,
    "skipped": 1,
    "prs_opened": 4,
    "review_needed": 4
  },
  "results": [
    {
      "project_name": "...",
      "job_type": "software-development",
      "status": "succeeded",
      "summary": "...",
      "pull_requests": ["https://github.com/.../pull/123"],
      "artifacts": [],
      "review_needed": true
    }
  ],
  "failures": []
}
```

Then modify `morning_brief_context.py` to include:

```python
from overnight_workbench_context import build_payload as build_workbench_payload
...
"overnight_workbench": build_workbench_payload(...)
```

Keep the morning-brief script JSON-only. The cron prompt formats the Telegram summary.

## Evening sweep integration

Modify `evening_sweep_context.py` in two phases:

### Phase 1: consume existing candidates

Leave `overnight_queue_candidates` mostly as-is, then queue writer converts them.

### Phase 2: enrich candidate metadata

Add fields:

```json
{
  "candidate_reason": "safe_verb=implement|explicit_context=overnight-ok",
  "suggested_job_type": "software-development",
  "safety_class": "pr_only",
  "target_repo_hint": "dndodson/argo",
  "acceptance_criteria": []
}
```

Also fetch `Contexts` in `fetch_active_projects()` so `overnight-ok` can become an explicit opt-in tag rather than only a future hook.

## Scheduling

Prefer user systemd for the worker because this is execution infrastructure with logs, not a user-facing digest.

### Service

`~/.config/systemd/user/hermes-overnight-workbench.service`

```ini
[Unit]
Description=Run Hermes overnight workbench
After=network-online.target
Wants=network-online.target

[Service]
Type=oneshot
WorkingDirectory=/home/dndodson
ExecStart=/home/dndodson/.hermes/hermes-agent/venv/bin/python3 /home/dndodson/.hermes/scripts/overnight_workbench_worker.py --once
```

### Timer

`~/.config/systemd/user/hermes-overnight-workbench.timer`

```ini
[Unit]
Description=Run Hermes overnight workbench nightly

[Timer]
OnCalendar=*-*-* 23:15:00 America/New_York
Persistent=false
Unit=hermes-overnight-workbench.service

[Install]
WantedBy=timers.target
```

Verification:

```bash
systemd-analyze calendar '*-*-* 23:15:00 America/New_York'
systemctl --user daemon-reload
systemctl --user enable hermes-overnight-workbench.timer
systemctl --user start hermes-overnight-workbench.timer
systemctl --user start hermes-overnight-workbench.service
systemctl --user status --no-pager --full hermes-overnight-workbench.service
journalctl --user -u hermes-overnight-workbench.service -n 50 --no-pager
```

## Tests

Use deterministic unit tests for policy and file behavior.

### Queue tests

- dry-run emits jobs without writing queue
- append writes one JSON object per line
- dedupe prevents duplicate same project/action/date
- max 3 jobs/project/night enforced
- max 10 jobs/night enforced
- unsafe markers rejected
- software-development candidate maps to `pr_only`

### Worker tests

- claims are atomic
- stale claims are detected
- dry-run writes no external side effects
- unsafe job writes `skipped_unsafe`
- missing result writes failure
- software-development prompt contains PR-required policy and merge-forbidden policy

### Context tests

- summarizes successes/failures/skips
- extracts PR URLs
- handles missing/malformed result files gracefully
- morning brief includes `overnight_workbench` even when empty

## Rollout plan

### PR 1 — Queue + context plumbing

Implement:

- `overnight_workbench_queue.py`
- `overnight_workbench_context.py`
- morning brief integration
- tests

No worker execution yet. This PR should prove the data path from evening candidates to morning summary.

### PR 2 — Dry-run worker

Implement:

- worker selection/caps/claims
- dry-run execution
- synthetic result writing
- systemd unit templates, not enabled by default
- tests

No real Hermes subprocess execution yet.

### PR 3 — Real worker execution for non-code job types

Implement:

- `hermes chat -q` execution
- result contract
- research/draft/audit/review/documentation prompts
- Notion log append behind explicit flag

Run with small hand-authored test jobs first.

### PR 4 — Software-development job type

Implement:

- repo resolution
- private repo creation fallback
- PR-opening contract
- local quality gate prompt requirements
- result parser for PR URLs

This is where the workbench becomes materially useful for Argo and Hermes self-improvement.

### PR 5 — Enable nightly timer

Implement:

- install/enable systemd timer
- manual dry-run verification
- one-night monitored pilot
- morning brief confirmation

## Operational guardrails

- No merges.
- No production deploys or service restarts.
- No trading/money actions.
- No credential changes.
- No destructive file or data operations outside explicit workspace/repo boundaries.
- Every output must be readable from a GitHub PR, web artifact, or Notion link.
- Fail closed: if policy is ambiguous, mark job `skipped_unsafe` with a reason.

## First implementation command sequence

Start with PR 1:

```bash
cd /home/dndodson/.hermes/scripts
# create tests first
# implement queue/context scripts
python -m pytest tests/test_overnight_workbench_queue.py tests/test_overnight_workbench_context.py -q
python overnight_workbench_queue.py --dry-run --pretty
python overnight_workbench_context.py --pretty
python morning_brief_context.py --pretty
```

Then open a PR against the relevant Hermes/config repository if this code is tracked there. If these scripts remain local-only, publish the implementation artifact and Notion log until we decide where operational scripts should live long-term.

## Open implementation decision

The only significant architecture decision left before PR 1 is where to keep the code under version control:

1. Keep scripts in `~/.hermes/scripts` and create a private `dndodson/hermes-ops` repo for backup/review.
2. Move scripts into a tracked Hermes Agent fork/profile repo if they should travel with Hermes itself.
3. Keep them local for v1, then migrate after the pilot.

Recommendation: create a private `dndodson/hermes-ops` repo for operational automation scripts. It matches the artifact/project nature of Stage 3, makes PR review natural, and avoids mixing Daniel-specific automation into upstream Hermes Agent code.