# Hermes Autonomy Stage 3 — Overnight Workbench Design

Generated: 2026-05-03T17:27:03+00:00
Project: Hermes Autonomy Metaproject

## Decision summary

Use a **local JSONL workbench queue as the Stage 3 v1 substrate**, fed by the existing evening sweep and reported by the existing morning brief. Keep Notion as the strategic project ledger, but do not make Notion the job queue yet.

Why: JSONL is easy to make append-only, lockable, inspectable, idempotent, testable, and independent from Notion schema churn. Notion remains where Daniel reviews project state; the queue is an execution artifact.

## Stage 3 objective

By morning, Daniel should see not just what happened yesterday, but what Hermes safely attempted overnight and what is ready for review. The loop is:

1. Evening sweep identifies conservative `overnight-ok` candidates.
2. A queue writer records bounded jobs with explicit safety class and max runtime.
3. An overnight worker executes only allowlisted job types.
4. Results are written to local artifacts and, when useful, appended to the relevant Notion project log.
5. Morning brief includes an `overnight_workbench` section with completed, skipped, failed, and review-needed work.

## v1 architecture

```text
Notion AI Delegated Projects
          │
          │ read by evening_sweep_context.py
          ▼
~/.hermes/state/overnight_workbench/queue.jsonl
          │
          │ consumed by overnight_workbench_worker.py
          ▼
~/.hermes/state/overnight_workbench/runs/YYYY-MM-DD/<job_id>/
          │
          ├── result.json
          ├── transcript.md / summary.md
          └── artifacts...
          │
          ▼
morning_brief_context.py includes overnight_workbench summary
```

### Files to add

- `~/.hermes/scripts/overnight_workbench_queue.py`
  - Builds queue entries from `evening_sweep_context.build_payload()` candidates.
  - Writes append-only JSONL records.
  - Dedupes by `(project_id, next_action_hash, local_date)`.

- `~/.hermes/scripts/overnight_workbench_worker.py`
  - Claims pending jobs with a lock file.
  - Executes allowlisted job types.
  - Writes `result.json` and concise summaries.
  - May push branches and open or update GitHub PRs for Daniel review.
  - Does **not** merge, deploy, publish outside GitHub PRs, send external messages, or spend money.

- `~/.hermes/scripts/overnight_workbench_context.py`
  - Reads recent run results for the morning brief.
  - Emits compact JSON only.

- Modify `morning_brief_context.py`
  - Add `overnight_workbench` section.

- Modify `evening_sweep_context.py`
  - Optionally include `queue_candidate_reason` and `safety_class` for each candidate.

## Queue entry schema

```json
{
  "job_id": "2026-05-03T223300Z_<short_hash>",
  "created_at": "2026-05-03T22:33:00-04:00",
  "created_by": "evening-sweep",
  "project_id": "notion-page-id",
  "project_name": "Human readable project name",
  "project_url": "https://www.notion.so/...",
  "goal_id": "optional-active-goal-page-id",
  "goal_name": "optional active goal name",
  "next_action": "The exact action to work on",
  "action_hash": "sha256(next_action)",
  "job_type": "research|draft|audit|review|documentation|software-development",
  "safety_class": "read_only|draft_only|local_code_only|pr_only",
  "max_runtime_minutes": 60,
  "max_tool_calls": 30,
  "allowed_side_effects": ["local_files", "git_branch", "git_push", "github_pr_open", "github_pr_update", "github_repo_create", "notion_log_append"],
  "forbidden_side_effects": ["merge", "deploy", "publish_outside_pr", "send_external", "money", "delete"],
  "status": "pending|claimed|succeeded|failed|skipped",
  "claim": {
    "claimed_at": null,
    "worker_pid": null,
    "session_id": null
  },
  "result_path": null
}
```

## Safety envelope

### Allowed unattended work

- Read-only research and summarization.
- Draft documents, implementation plans, PR descriptions, test plans, and code-review notes.
- Local-only code exploration and non-mutating tests.
- Creating branches, committing changes, pushing branches, and opening or updating GitHub PRs for Daniel review. Open PRs are the main review mechanism for overnight work.
- Appending concise Notion logs that say what was attempted and where artifacts live.

### Requires Daniel review before action

- Posting to external audiences.
- Sending email or Telegram messages beyond configured cron delivery.
- Any production deploy, service restart, migration, or config change.
- Any trading, brokerage, payment, purchase, subscription, or money movement.
- Any destructive operation or irreversible data edit.

### Forbidden unattended

- Merges.
- Production deploys.
- Public publishing outside the normal GitHub PR review path.
- Financial/trading actions.
- Credential changes.
- Deleting user/project data.
- Running commands outside explicit repo/workspace boundaries.

## Worker policy

- Run at most three jobs per project per night.
- Global cap: 10 jobs/night in v1.
- Per-job cap: 60 minutes and 30 tool calls.
- Prefer `hermes chat -q` with a self-contained prompt and narrow toolsets.
- Use `systemd-run --user --scope` or a sibling detached process so gateway restarts do not kill workers.
- Treat nonzero exit, timeout, or missing `result.json` as failed.
- If a job was claimed but no result is written, mark it `failed_stale_claim` on the next sweep.

## Initial job types

1. `research`
   - Inputs: question / next action.
   - Outputs: `summary.md`, sources, suggested next action.

2. `draft`
   - Inputs: project objective and requested deliverable.
   - Outputs: draft markdown artifact; optional Notion log pointer.

3. `audit`
   - Inputs: repo/path/system component.
   - Outputs: findings and safe recommendations; may open a PR if the resulting change is bounded, testable, and reviewable.

4. `review`
   - Inputs: existing artifact or PR URL.
   - Outputs: review notes; no comments posted externally.

5. `documentation`
   - Inputs: context and desired doc.
   - Outputs: local markdown draft.

6. `software-development`
   - Inputs: a written description of the feature to build or problem to solve, plus any known constraints, acceptance criteria, relevant project/goal context, and optional target repository hints.
   - Repository resolution: if a relevant repository already exists, work there; if none exists, create a new private GitHub repository under `dndodson` unless Daniel explicitly specifies otherwise.
   - Outputs: one or more GitHub PRs to the relevant repository/repositories. Each PR should be bounded, reviewable, and include objective, scope, tests run, risk notes, and explicit review request.
   - Completion condition: the job is considered successful when the PR(s) are open and linked in the overnight result / Notion log; merging remains Daniel-gated.

## Integration with current scripts

### evening_sweep_context.py

Current state already emits `overnight_queue_candidates`. Improve candidates with:

- Context tags fetched from Notion (currently noted as future hook).
- `candidate_reason` such as `safe_verb=research` or `explicit_context=overnight-ok`.
- `blocked_reason` for candidates rejected by unsafe markers.

### morning_brief_context.py

Add:

```json
"overnight_workbench": {
  "window": {"start": "...", "end": "..."},
  "totals": {"queued": 2, "succeeded": 1, "failed": 0, "skipped": 1},
  "results": [
    {
      "project_name": "...",
      "status": "succeeded",
      "summary": "...",
      "artifact_path": "...",
      "review_needed": true
    }
  ]
}
```

## Cron/systemd schedule

Prefer a user systemd timer for the worker, because it is an execution process rather than a user-facing digest.

- Queue writer: integrated into evening sweep, or separate at 22:45 ET.
- Worker: 23:15 ET with `Persistent=false` initially.
- Morning brief: keep existing 07:57 ET cron and read workbench results.

## v1 acceptance criteria

- Evening sweep produces at least one dry-run queue entry from a synthetic safe project/action.
- Worker can run in `--dry-run` and `--once` modes.
- Worker refuses unsafe jobs with a clear `skipped_unsafe` result.
- A successful dry-run result appears in morning brief JSON.
- No external side effects occur in dry-run.
- Notion project log receives only a pointer/summary when explicit `notion_log_append` is enabled.

## PR policy for overnight work

Open PRs are explicitly allowed and encouraged when they are the cleanest review artifact. The overnight worker should prefer PRs over loose local patches for code changes, because GitHub PRs provide Daniel's normal review surface, diffs, CI, and comment thread.

For `software-development` jobs, PRs are the expected output, not an optional bonus. The worker should translate a written feature/problem description into one or more reviewable PRs. If there is no existing relevant repository, the worker may create a new private `dndodson` GitHub repository and open the initial PR there.

Rules:

- PRs should be draft or clearly labeled as overnight/AI-generated when appropriate.
- PR descriptions must include objective, scope, tests run, risk notes, and explicit review request.
- The worker may push branches and update PRs, but must not merge them.
- The worker may create a new private GitHub repository when no relevant repository exists, but should not make it public without explicit Daniel approval.
- If tests fail, the PR may still be opened as draft only if the failure is clearly documented and the PR is useful for review.
- Production-impacting changes still require Daniel review before merge/deploy.

## Recommended next implementation slice

1. Add `overnight_workbench_queue.py` with dry-run and append modes.
2. Add `overnight_workbench_context.py` that summarizes queue/run files.
3. Modify `morning_brief_context.py` to include empty-or-real workbench results.
4. Run dry-run against current evening sweep candidates.
5. Only after the JSON plumbing is proven, add the worker.

## Open decisions for Daniel

1. Approve JSONL as v1 queue substrate, with Notion remaining the project ledger.
2. Worker cap is set: at most 3 jobs per project per night, with a global cap of 10 jobs/night.
3. Confirm whether v1 PRs should default to draft PRs, or normal PRs with an `overnight` / `ai-generated` label.
4. Confirm whether Notion log append is allowed unattended for successful overnight jobs, or only for failures/review-needed summaries.
