KW Knowledge base

Admin · Platform health

The platform-health pages give admins visibility into the running services and recent incidents.

Mitch Wigham
Updated 24 June 2026 · 6 views

27 · Admin · Platform health

The platform-health pages give admins visibility into the running services and recent incidents.

Uptime

/uptime

The user-facing Uptime monitoring page. Anyone with the watchdog feature can configure three kinds of monitor:

Kind What it checks Notes
⚙️ Service Internal docker service URL Supports auto-restart of the named container
🌐 Website External HTTP/HTTPS URL No restart — pure status check
🖥️ Device RMM endpoint freshness OK while the device's last-seen timestamp is within the chosen window (default 5 min)

Common controls per target:

  • Name + interval (poll cadence) + failure threshold
  • Public flag → target appears on /status for customers
  • Group + display order for the public-page layout
  • ⟳ Trigger an ad-hoc check
  • ⏻ Manually restart the named container (Service kind only)
  • Edit / remove

Header stats: total targets, UP / DOWN counts, open incidents, restart attempts in last 24 h, rolling 24 h uptime %.

📷 Screenshot placeholder: screenshots/uptime.png

Device-kind targets

When you pick Device, the form swaps the URL input for an RMM device dropdown plus a freshness window (seconds). The watchdog poller flips the target to FAIL when now - device.lastSeenAt exceeds the window — which is what you want when you can't reach the endpoint with HTTP but the agent is still expected to phone home.

⚠️ Caution. A FAIL on a device-kind target means the agent has gone silent, not necessarily that the box is off — flaky network can trip it. Pair with the device's RMM alerts for the full picture.

/admin/health and /uptime render the same component against the watchdog service — /uptime is the same page surfaced outside admin so it can live in the main sidebar for daily ops eyes-on.

Health

/admin/health

The admin-side view of the watchdog. It shows the monitor targets you have configured (the Service / Website / Device kinds described above), not a fixed list of backend services — what appears here is whatever has been added as a target.

+----------------------------------------------------------------+
| Name              Kind      Status   Latency   Last checked     |
|----------------------------------------------------------------|
| portal            ⚙ Service  ✓ OK     14ms      12:04:31        |
| auth-service      ⚙ Service  ✓ OK     12ms      12:04:31        |
| helpdesk-service  ⚙ Service  ✓ OK     18ms      12:04:31        |
+----------------------------------------------------------------+

📷 Screenshot placeholder: screenshots/admin-health.png

Each target row carries its last status, last latency, last check time and consecutive-failure count. You can add, edit and remove targets, trigger an ad-hoc check, and (for Service-kind targets with a container name) trigger a restart. Header stats mirror the /uptime page: total targets, OK / FAIL counts, open incidents, restarts in the last 24 h and the rolling 24 h uptime %.

A fresh seeded install ships three Service targets — portal, auth-service and helpdesk-service — each pointed at the service's /health endpoint.

Status page

/status is the public version of health — what your customers see during incidents. It lists the targets you have flagged Public (grouped by their Group name) plus any RMM devices flagged for the public status page, each with a status pip and — for monitor targets — a rolling check-history bar and a 7-day uptime %. No internal details (no per-service host info, no credentials).

Watchdog

The watchdog-service runs as a separate process (port 3019). It runs a poll tick every 10 s (POLL_TICK_SECONDS); on each tick it checks any target whose own interval is due — the per-target default is 30 s. Each check is one of:

  • Service / Website — an HTTP probe of the target URL
  • Device — a freshness check against an RMM device's last-seen timestamp

When a target's consecutive failures reach its failure threshold, the watchdog:

  • Opens a health incident and flips the target to FAIL on /admin/health and /uptime
  • Shows an incident on /status if the target is Public
  • Raises a CRITICAL RMM alert if an RMM device matches the target's name (so the org-wide alert queue stays the single source of truth)
  • For a Service-kind target with auto-restart on and a container name set, attempts docker restart of that container (only when the service is built with ENABLE_DOCKER_RESTART=true)

The watchdog also writes a heartbeat file so an external monitor can detect a dead watchdog.

Messaging health

/admin/messaging

The chat / messaging service has its own health view:

  • Connected users (live count)
  • Messages/sec
  • Channel count
  • WebSocket reconnect rate (any sustained spike indicates trouble)

Meetings health

/admin/meetings

  • Active meetings
  • Average per-meeting attendees
  • Recording storage used
  • Jitsi cluster status (if multi-host)

Studio service health

/admin/studio

Tied to the design-service that renders previews. Usually green; if this is red, branding edits won't preview but the platform still works.

API explorer

/api-explorer

OpenAPI documentation for every backend service. Use it to:

  • Confirm a service is reachable from your browser
  • Test specific endpoints with the Try it button (auth headers applied automatically from your session)
  • Find request/response shapes when integrating

📷 Screenshot placeholder: screenshots/api-explorer.png

Logs

The platform writes structured JSON logs to stdout. On a docker-compose install, the runbook ../runbooks/portal.md shows where they go and how to tail them.

Common workflows

Investigate a slow page

  1. /admin/health → check the latency column on each monitor target.
  2. Open the target → review its recent checks and incident history.
  3. Cross-reference with the service logs (see the portal runbook).

After an incident

  1. Open /admin/audit → filter to the incident window.
  2. Cross-reference with /admin/health restart history.
  3. Post a public update on /status (Admin → Status page → New incident).

Check that the relay is up

  • Add a Service-kind target on /admin/health pointed at the relay's /health endpoint, or
  • Hit the relay's /health endpoint directly — it returns { "status": "ok", "service": "relay-service", ... }.

Permissions

Action Role
View admin health admin
Post status updates admin
View public /status anyone
API Explorer admin (logged-in default; some endpoints public)

See also

Still need help?

Log a support ticket and the team will pick it up from this page.