Tune Proxmox dispatch and infrastructure package notes

This commit is contained in:
Carlo Costanzo
2026-07-01 17:03:39 -04:00
parent 91806cdeaa
commit a10a3be8a4
3 changed files with 84 additions and 71 deletions
+3 -3
View File
@@ -45,14 +45,14 @@ Live collection of plug-and-play Home Assistant packages. Each YAML file in this
| [garadget.yaml](garadget.yaml) | MQTT-based garage door control plus arrival helpers, entry prompts, wind checks, nighttime reminders, and camera context. | `cover.large_garage_door`, `cover.small_garage_door`, `group.garage_doors`, `script.open_large_garage_door_if_ready` |
| [august.yaml](august.yaml) | Front-door August smart lock with Alexa Show camera pop-up when unlocked. | `lock.front_door`, media_player actions for front doorbell camera |
| [holiday.yaml](holiday.yaml) | REST-driven US holiday + flag sensors plus the inspectable exterior lighting mode. | `sensor.holiday`, `sensor.flag`, `sensor.holiday_lighting_mode`, `sensor.holiday_lighting_scene`, JSON feed at `config/www/json_data/holidays.json` |
| [lightning.yaml](lightning.yaml) | Blitzortung lightning counter monitoring with snoozeable push actions. | `sensor.blitzortung_lightning_counter`, `input_boolean.snooze_lightning`, notify engine actions |
| [lightning.yaml](lightning.yaml) | Blitzortung lightning counter monitoring with snoozeable push actions and a 60-minute duplicate-trigger cooldown. | `sensor.blitzortung_lightning_counter`, `input_boolean.snooze_lightning`, notify engine actions |
| [logbook_activity_feed.yaml](logbook_activity_feed.yaml) | Dummy `sensor.activity_feed` + helper to write clean Activity entries (Issue #1550). | `sensor.activity_feed`, `script.send_to_logbook` |
| [mariadb_monitoring.yaml](mariadb_monitoring.yaml) | MariaDB health sensors and Lovelace dashboard snippet for recorder stats. | `sensor.mariadb_status`, `sensor.database_size` |
| [llmvision.yaml](llmvision.yaml) | Vision-backed garage-can and front-door package checks with rate-limited, downscaled OpenAI calls for package detection; see the [package reminder video](https://youtu.be/nAhCezFetvI). | `input_button.llmvision_*`, `binary_sensor.front_door_packages_present`, `llmvision.stream_analyzer` |
| [docker_infrastructure.yaml](docker_infrastructure.yaml) | Docker host patching telemetry, container/stack Repairs automation, retired Portainer repair cleanup, 20-minute Joanna escalation for persistent container outages using stable configured monitor membership, and weekly scheduled prune actions across docker_10/14/17/69; the dedicated codex_appliance VM is monitored through BearClaw status telemetry. | `sensor.docker_*_apt_status`, `binary_sensor.*_stack_status`, `sensor.docker_stacks_down_count`, `repairs.create`, `repairs.remove`, `script.joanna_dispatch` |
| [proxmox.yaml](proxmox.yaml) | Proxmox update detection with HA notifications, Repairs + Joanna upgrade orchestration, kernel-refresh handoff hints, runtime and disk pressure monitoring, plus nightly Frigate reboot. | `binary_sensor.node_proxmox*_updates_packages`, `sensor.node_proxmox*_total_updates`, `persistent_notification.create`, `script.joanna_dispatch`, `binary_sensor.proxmox*_runtime_healthy`, `sensor.proxmox*_disk_used_percentage`, `button.qemu_docker2_101_reboot` |
| [proxmox.yaml](proxmox.yaml) | Proxmox update detection with Repairs, 02:15 Joanna patch orchestration, final per-host HA success notifications, kernel-refresh handoff hints, runtime and disk pressure monitoring, plus nightly Frigate reboot. | `binary_sensor.node_proxmox*_updates_packages`, `sensor.node_proxmox*_total_updates`, `persistent_notification.create`, `script.joanna_dispatch`, `binary_sensor.proxmox*_runtime_healthy`, `sensor.proxmox*_disk_used_percentage`, `button.qemu_docker2_101_reboot` |
| [synology_dsm.yaml](synology_dsm.yaml) | Synology DSM integration health normalization for Carlo-NAS01 and Carlo-NVR, with outage-aware Joanna-first handling for lone post-outage volume warnings and Repairs escalation for persistent or non-outage problems. | `binary_sensor.carlo_*_synology_problem`, `sensor.carlo_*_synology_problem_summary`, `binary_sensor.powerwall_grid_status`, `repairs.create`, `script.joanna_dispatch` |
| [infrastructure.yaml](infrastructure.yaml) | Normalized WAN/DNS/backup/domain/cert health, Nebula Sync and promoted IoT primary/backup Pi-hole consistency monitoring with Joanna dispatch, Glances-backed Docker host disk pressure with Joanna-only warning cleanup and critical Repairs, and website uptime/latency SLO signals for Infrastructure dashboards, plus nightly backup verification and monthly Joanna HA log hygiene review with GitHub issue follow-up. | `sensor.infra_nebula_sync_dns_consistency`, `sensor.infra_pihole_iot_dns_consistency`, `binary_sensor.infra_nebula_sync_degraded`, `binary_sensor.infra_pihole_iot_dns_degraded`, `sensor.docker_*_disk_used_percentage`, `automation.infra_nebula_sync_health_dispatch`, `automation.infra_pihole_iot_dns_drift_dispatch`, `automation.docker_host_disk_pressure_monitor`, `binary_sensor.infra_website_uptime_slo_breach`, `binary_sensor.infra_website_latency_degraded`, `automation.infra_backup_nightly_verification`, `script.joanna_dispatch` |
| [infrastructure.yaml](infrastructure.yaml) | Normalized WAN/DNS/backup/domain/cert health, Nebula Sync and promoted IoT primary/backup Pi-hole consistency monitoring with Joanna dispatch, Glances-backed Docker host disk pressure with Joanna-only warning cleanup and critical Repairs, and website uptime/latency SLO signals for Infrastructure dashboards, plus nightly backup verification and monthly Joanna HA log hygiene review with public-safe GitHub issue follow-up. | `sensor.infra_nebula_sync_dns_consistency`, `sensor.infra_pihole_iot_dns_consistency`, `binary_sensor.infra_nebula_sync_degraded`, `binary_sensor.infra_pihole_iot_dns_degraded`, `sensor.docker_*_disk_used_percentage`, `automation.infra_nebula_sync_health_dispatch`, `automation.infra_pihole_iot_dns_drift_dispatch`, `automation.docker_host_disk_pressure_monitor`, `binary_sensor.infra_website_uptime_slo_breach`, `binary_sensor.infra_website_latency_degraded`, `automation.infra_backup_nightly_verification`, `script.joanna_dispatch` |
| [onenote_indexer.yaml](onenote_indexer.yaml) | Dedicated-appliance OneNote indexer health/status monitoring for Joanna, explicit index-health confirmation, failure-repair automation, and a daily duplicate-delete maintenance request. | `sensor.onenote_indexer_last_job_status`, `binary_sensor.onenote_indexer_last_job_successful`, `binary_sensor.onenote_indexer_index_healthy` |
| [mqtt_status.yaml](mqtt_status.yaml) | Command-line MQTT broker reachability probe with Spook Repairs escalation and Joanna troubleshooting dispatch on outage. | `binary_sensor.mqtt_status_raw`, `binary_sensor.mqtt_broker_problem`, `repairs.create`, `rest_command.bearclaw_command` |
| [mariadb.yaml](mariadb.yaml) | MariaDB recorder health and capacity snapshots with hourly live metrics, weekly admin/recorder polling, and stats-ready numeric sensors. | `sensor.mariadb_status`, `sensor.database_size` |
+12 -9
View File
@@ -12,7 +12,7 @@
# Notes: Nightly Duplicati verification runs at 08:00 after the 05:30 Duplicati job and docker_14 reboot window.
# Notes: Duplicati transport/API errors are logged only; repairs are reserved for proven failed or stale backups.
# Notes: Duplicati failure Repairs enable a recovery poll that clears the Repair after a later successful run.
# Notes: Monthly HA log hygiene review requests Telegram + GitHub issue follow-up only; Joanna must wait for approval before any changes.
# Notes: Monthly HA log hygiene review requests Telegram + public-safe GitHub issue follow-up only; Joanna must wait for approval before any changes.
# Notes: Numeric WAN telemetry exposes state_class so recorder can keep long-term statistics.
# Notes: Docker host root disk usage uses Glances-backed normalized sensors; raw Glances sensors are recorder/logbook-filtered.
# Notes: Disk-pressure dispatch allows bounded safe cleanup of disposable caches and old generated backup artifacts, but not live data or restarts.
@@ -995,7 +995,7 @@ automation:
- alias: "Infrastructure - Monthly HA Log Hygiene Review"
id: infra_monthly_log_hygiene_review
description: "Ask Joanna monthly to review Home Assistant logs, create a GitHub issue with noisy entries, and send Telegram recommendations only."
description: "Ask Joanna monthly to review Home Assistant logs, create a public-safe GitHub issue with noisy entries, and send Telegram recommendations only."
mode: single
trigger:
- platform: time
@@ -1010,11 +1010,11 @@ automation:
data:
trigger_context: "{{ trigger_context }}"
source: "home_assistant_automation.infra_monthly_log_hygiene_review"
summary: "Monthly Home Assistant log hygiene review with GitHub issue and Telegram follow-up"
summary: "Monthly Home Assistant log hygiene review with public-safe GitHub issue and Telegram follow-up"
diagnostics: >-
schedule=day_1@03:20:00,
review_scope=available_home_assistant_logs,
desired_outputs=telegram_follow_up+github_issue,
desired_outputs=telegram_follow_up+public_safe_github_issue,
github_repo=CCOSTAN/Home-AssistantConfig,
approval_required_before_changes=true
request: >-
@@ -1022,11 +1022,14 @@ automation:
low-value entries that could be safely suppressed, filtered, slowed, deduplicated, or
retired. Focus on practical Home Assistant-side changes such as recorder exclusions,
logger filtering, scan-interval reductions, entity retirement, or automation de-noising.
Create or refresh a GitHub issue in CCOSTAN/Home-AssistantConfig that captures the noisy
entries, estimated frequency, why each candidate is low-value, and the exact repo files
or integrations likely to change. Then send Carlo a concise Telegram summary with the top
recommendations and the GitHub issue number or link. Do not make any changes from this
review. Wait for explicit follow-up approval first.
Create or refresh a public-safe GitHub issue in CCOSTAN/Home-AssistantConfig that captures
the noisy entries, estimated frequency, why each candidate is low-value, and the exact repo
files or integrations likely to change. Before publishing, redact or generalize all person
names, family/member names, location names, addresses, GPS coordinates, zone labels, and
device-tracker friendly names from log evidence; use counts, generic roles, and public repo
paths instead. Then send Carlo a concise Telegram summary with the top recommendations and
the GitHub issue number or link. Do not make any changes from this review. Wait for explicit
follow-up approval first.
- service: script.send_to_logbook
data:
topic: "HOME ASSISTANT"
+69 -59
View File
@@ -9,9 +9,9 @@
# Related Issue: 1584
# Related Issue: 1798
# Notes: Creates HA repair issues when proxmox nodes report updates.
# Notes: Proxmox update activity also writes HA persistent notifications.
# Notes: Proxmox update activity writes one final HA success notification per patched host.
# Notes: Adds normalized runtime + disk health signals for dashboard/alerts.
# Notes: Joanna dispatch handles Proxmox updates plus sustained runtime/disk degradations.
# Notes: Joanna dispatch handles overnight Proxmox updates plus sustained runtime/disk degradations.
# Notes: Normalized disk usage sensors expose state_class for long-term trend rollups.
######################################################################
template:
@@ -114,7 +114,7 @@ automation:
- alias: "Proxmox Updates Repair Issues"
id: proxmox_updates_repair
description: "Track repair issues when Proxmox hosts report updates."
description: "Track repair issues when Proxmox hosts report updates, then notify once per host when updates clear."
mode: restart
trigger:
- platform: state
@@ -151,51 +151,57 @@ automation:
description: >
{{ trigger.entity_id }} is ON, indicating pending updates on {{ node_name }}.
Apply updates in Proxmox, then reload this sensor to clear the issue.
- service: persistent_notification.create
data:
notification_id: "{{ issue_id }}"
title: "{{ node_name }} Proxmox updates available"
message: >
{{ node_name }} reports pending Proxmox updates via {{ trigger.entity_id }}.
Joanna upgrade orchestration will run after the cluster update state settles.
default:
- service: repairs.remove
continue_on_error: true
data:
issue_id: "{{ issue_id }}"
- service: persistent_notification.dismiss
continue_on_error: true
data:
notification_id: "proxmox_updates_joanna_dispatch"
- service: persistent_notification.create
data:
notification_id: "{{ issue_id }}"
title: "{{ node_name }} Proxmox updates cleared"
title: "{{ node_name }} Proxmox updates applied"
message: >
{{ node_name }} update telemetry returned to {{ trigger.to_state.state }}.
Repairs state has been cleared and Home Assistant now considers this host patched.
{{ node_name }} updates were successfully applied.
Update telemetry is now {{ trigger.to_state.state }}, and the repair issue has been cleared.
- service: script.send_to_logbook
data:
topic: "PROXMOX"
message: "{{ node_name }} has been Patched"
message: "{{ node_name }} Proxmox updates were successfully applied."
- alias: "Proxmox Updates Joanna Dispatch"
id: proxmox_updates_joanna_dispatch
description: "Dispatch Joanna when Proxmox host updates are available, with kernel-refresh routing hints."
description: "Log when Proxmox updates appear, then dispatch Joanna overnight if updates remain."
mode: restart
trigger:
- platform: state
id: detected
entity_id:
- binary_sensor.node_proxmox1_updates_packages
- binary_sensor.node_proxmox02_updates_packages
to: "on"
- platform: time
id: overnight
at: "02:15:00"
condition:
- condition: template
value_template: >-
{{ is_state('binary_sensor.node_proxmox1_updates_packages', 'on') or
is_state('binary_sensor.node_proxmox02_updates_packages', 'on') }}
action:
- delay: "00:01:00"
- condition: template
value_template: >-
{{ is_state('binary_sensor.node_proxmox1_updates_packages', 'on') or
is_state('binary_sensor.node_proxmox02_updates_packages', 'on') }}
- choose:
- conditions:
- condition: trigger
id: detected
sequence:
- delay: "00:01:00"
- condition: template
value_template: >-
{{ is_state('binary_sensor.node_proxmox1_updates_packages', 'on') or
is_state('binary_sensor.node_proxmox02_updates_packages', 'on') }}
- variables:
proxmox1_updates_count: "{{ states('sensor.node_proxmox1_total_updates') | int(0) }}"
proxmox02_updates_count: "{{ states('sensor.node_proxmox02_total_updates') | int(0) }}"
@@ -230,7 +236,9 @@ automation:
structured_request: |-
PROXMOX_HOST_UPGRADE_REQUEST
tracking_issue=https://github.com/CCOSTAN/Home-AssistantConfig/issues/1798
triggered_entity={{ trigger.entity_id }}
triggered_entity={{ trigger.entity_id | default('time.proxmox_updates_overnight_window', true) }}
dispatch_reason={{ trigger.id }}
dispatch_window=overnight_02:15_local
cluster_nodes=ProxMox1,ProxMox02
proxmox1_updates_sensor=binary_sensor.node_proxmox1_updates_packages
proxmox1_updates_state={{ states('binary_sensor.node_proxmox1_updates_packages') }}
@@ -242,44 +250,46 @@ automation:
proxmox02_updates={{ proxmox02_updates_summary }}
kernel_update_detected={{ (kernel_update_packages | trim) != 'none detected' }}
kernel_update_packages={{ kernel_update_packages | trim }}
required_policy=Inspect live Proxmox cluster health, storage, VM placement, and node status before changing anything. Patch hosts one node at a time and verify each node returns healthy before moving to the next. If kernel packages are present or a reboot is required after kernel updates, use the $kernel-refresh skill and follow its ordered node-cycling workflow. Verify the live placement of Frigate/Docker14 VM 101, stop it before rebooting the host that owns it, and start it only after the host cycle and VM migrations are complete. Report progress, failures, resume state, and final verification.
- service: script.send_to_logbook
data:
topic: "PROXMOX"
message: >-
Proxmox updates detected on one or more hosts. Joanna upgrade orchestration requested.
Kernel refresh: {{ 'yes' if (kernel_update_packages | trim) != 'none detected' else 'not detected from HA package list' }}.
- service: persistent_notification.create
data:
notification_id: "proxmox_updates_joanna_dispatch"
title: "Proxmox updates detected"
message: |-
Joanna upgrade orchestration requested for Proxmox host updates.
ProxMox1: {{ proxmox1_updates_count }} updates - {{ proxmox1_updates_summary | trim }}
ProxMox02: {{ proxmox02_updates_count }} updates - {{ proxmox02_updates_summary | trim }}
Kernel refresh: {% if (kernel_update_packages | trim) != 'none detected' %}yes - {{ kernel_update_packages | trim }}{% else %}not detected from HA package list{% endif %}
- service: script.joanna_dispatch
data:
trigger_context: "HA automation proxmox_updates_joanna_dispatch (Proxmox Updates Joanna Dispatch)"
source: "home_assistant_automation.proxmox_updates_joanna_dispatch"
summary: >-
Proxmox host updates detected. Kernel refresh
{{ 'required' if (kernel_update_packages | trim) != 'none detected' else 'not detected from HA package list' }}.
entity_ids:
- "binary_sensor.node_proxmox1_updates_packages"
- "sensor.node_proxmox1_total_updates"
- "binary_sensor.node_proxmox02_updates_packages"
- "sensor.node_proxmox02_total_updates"
- "binary_sensor.proxmox1_runtime_healthy"
- "binary_sensor.proxmox02_runtime_healthy"
- "binary_sensor.qemu_docker2_101_status"
diagnostics: >-
proxmox1_updates_count={{ proxmox1_updates_count }},
proxmox02_updates_count={{ proxmox02_updates_count }},
kernel_update_detected={{ (kernel_update_packages | trim) != 'none detected' }},
kernel_update_packages={{ kernel_update_packages | trim }}
request: "{{ structured_request }}"
required_policy=This dispatch is Carlo's authorization to install pending same-release Proxmox package updates after live preflight passes; do not stop after read-only validation. Inspect live Proxmox cluster health, storage, VM placement, and node status before changing anything. Patch hosts one node at a time during the overnight window and verify each node remains healthy before moving to the next. Do not perform major-version upgrades, repository migrations, destructive cleanup, or force operations from this request. If the live preflight is unhealthy, pause and report instead of patching. If kernel packages are present or a reboot is required after kernel updates, use the $kernel-refresh skill and follow its ordered node-cycling workflow. Verify the live placement of Frigate/Docker14 VM 101, stop it before rebooting the host that owns it, and start it only after the host cycle and VM migrations are complete. After each host, verify the package versions/update list, then report progress, failures, resume state, package versions, and final verification.
- choose:
- conditions:
- condition: trigger
id: overnight
sequence:
- service: script.send_to_logbook
data:
topic: "PROXMOX"
message: >-
Overnight Proxmox update window reached. Joanna patch orchestration requested.
Kernel refresh: {{ 'yes' if (kernel_update_packages | trim) != 'none detected' else 'not detected from HA package list' }}.
- service: script.joanna_dispatch
data:
trigger_context: "HA automation proxmox_updates_joanna_dispatch (Proxmox Updates Joanna Dispatch - Overnight)"
source: "home_assistant_automation.proxmox_updates_joanna_dispatch"
summary: >-
Overnight Proxmox patch window reached. Patch hosts one at a time.
Kernel refresh {{ 'required' if (kernel_update_packages | trim) != 'none detected' else 'not detected from HA package list' }}.
entity_ids:
- "binary_sensor.node_proxmox1_updates_packages"
- "sensor.node_proxmox1_total_updates"
- "binary_sensor.node_proxmox02_updates_packages"
- "sensor.node_proxmox02_total_updates"
- "binary_sensor.proxmox1_runtime_healthy"
- "binary_sensor.proxmox02_runtime_healthy"
- "binary_sensor.qemu_docker2_101_status"
diagnostics: >-
proxmox1_updates_count={{ proxmox1_updates_count }},
proxmox02_updates_count={{ proxmox02_updates_count }},
kernel_update_detected={{ (kernel_update_packages | trim) != 'none detected' }},
kernel_update_packages={{ kernel_update_packages | trim }}
request: "{{ structured_request }}"
default:
- service: script.send_to_logbook
data:
topic: "PROXMOX"
message: >-
Proxmox updates detected on one or more hosts. Joanna patch dispatch scheduled for 02:15 if updates remain.
Kernel refresh: {{ 'yes' if (kernel_update_packages | trim) != 'none detected' else 'not detected from HA package list' }}.
- alias: "Proxmox Runtime Repair Issues"
id: proxmox_runtime_repairs