v1.1.0#

07/04/2026

Attention

Reply CMP v1.1.0 is now available v1.1.0 delivers the native Monitoring module — fully reimagined operational dashboards built on top of Azure Monitor, AWS CloudWatch, and GCP Cloud Monitoring APIs, alongside a complete Alert Rules engine with email and in-app notifications.

What’s Changed#

🚀 New Features#

Native Monitoring Dashboards#

The legacy Grafana-embedded monitoring experience has been replaced with a first-class native dashboard engine that queries cloud metrics and logs directly through provider APIs — no external dependency required.

Dashboards

Free-form 12-column grid canvas with drag-and-drop widget layout
Import / export dashboards as .welkin-dashboard JSON files to share configurations across tenants or environments
Set a default dashboard as the landing page when opening Monitoring
Per-tenant dashboard library with name, description, and timestamps

Widget types: Timeseries, Stat, Table, Alert List

Supported query types:

Provider	Query types
Azure	Azure Monitor Metrics, Log Analytics (KQL), Azure Alerts Management
AWS	CloudWatch Metrics, CloudWatch Alarms, CloudWatch Logs Insights
GCP	Cloud Monitoring Metrics, Alert Policies

Query features:

Multi-connection queries: target the same metric across multiple connections in a single widget
Configurable time range per widget (presets + custom absolute range)
Query results cached for 5 minutes for low-latency dashboard refresh
Auto-type-detection for Log Analytics workspaces; selectable workspaces for cross-workspace KQL
Dimension filtering for Azure Metrics; alignment period and reducer selection for GCP Metrics

Alert Rules#

Automated threshold monitoring across all three cloud providers.

Configure a query, an evaluation condition (Gt / Gte / Lt / Lte / Eq / Any Result), and a schedule (minimum 1-minute interval)
State machine: Unknown → OK → Firing → OK with automatic resolution
Evaluation targets: Last Value, Average, Count — reduce any query result to a single comparable number
On-demand evaluation: trigger an immediate evaluation from the rule detail page without waiting for the next scheduled run
Alert history: last 50 state transitions recorded per rule (Firing / Resolved / Error), with evaluated value and message
Quota: 50 rules per tenant
Notifications (per-rule):
- In-app notification on fire and resolve (enabled by default)
- Email notification with HTML and plain-text templates, fired-state colour coding, human-readable condition summary, and contextual next-steps guidance
- Webhook and Microsoft Teams channels via registered Webhooks
Cooldown: per-rule 30-minute delivery cooldown prevents notification storms during sustained firing; spurious Unknown → OK resolved notifications are suppressed

Create an alert from a dashboard widget: use the widget context menu → “Create Alert” to pre-fill the query configuration automatically.

Webhooks#

A new Webhooks management section (Administration → Webhooks) lets you register outbound webhook endpoints and Microsoft Teams connectors that can be targeted by Alert Rules.

Supports three channel types:
- Webhook — generic HTTPS POST with HMAC-SHA256 request signing
- Microsoft Teams — Adaptive Card / MessageCard via channel connector
- PagerDuty — native integration via Events API v2; incidents are opened on fire and automatically resolved when the rule returns to OK
Per-channel delivery history shows the last 200 delivery attempts with HTTP status codes and error messages
Test delivery sends a sample payload to verify connectivity before attaching a channel to a rule
Outbound URLs are validated for HTTPS and must not resolve to private IP ranges (SSRF protection)
Channels can be temporarily disabled without removing them from configured rules

Cost Spike Detection#

Automatic anomaly detection added to the FinOps ingestion pipeline.

After every cost refresh, the platform compares actual costs to ML-generated forecasts and raises a spike notification when both thresholds are exceeded simultaneously:
- Percentage increase ≥ 25 % above forecast (configurable)
- Absolute increase ≥ 2.00 (reporting currency) above forecast (configurable)
Group-level detection: when enabled, the engine checks spend within each allocation group and reports which groups crossed the threshold, not just the tenant total
Notification channels: in-app, email (with HTML template listing affected groups, actual vs forecast, and excess amounts), and registered Webhooks
Configuration: per-tenant settings at FinOps → Assess → Settings → Cost Spike Alerts — enable/disable, adjust thresholds, toggle group detection, and configure recipients

CMP Agent — expanded capabilities#

The CMP Agent’s MCP server has been substantially extended. New domains available from the agent panel:

Live Monitoring Queries — query_metrics, query_logs, query_alerts, list_metric_namespaces, list_metrics, list_workspaces: run real-time metric, log, and alert queries directly against Azure Monitor, AWS CloudWatch, and GCP Cloud Monitoring APIs; supports filtering, aggregation, time binning, and workspace discovery
Alert Rule Management — list_alert_rules, get_alert_rule, get_alert_rule_history, toggle_alert_rule, evaluate_alert_rule_now: inspect alert rule configurations and state history, enable/disable rules, and trigger on-demand evaluations
Automation Policies — list_policy_definitions, index_policy_instances, get_policy_instance_jobs: query available policy blueprints, active policy instances, and per-instance execution job history with credits used
Cloud Reservations — index_reservation_costs: payment lifecycle of Azure Reserved Instances, AWS Reserved Instances, and GCP Commitments with expected next payment dates
Provisioning — index_deployments, index_catalog_items: inspect provisioned infrastructure stacks and available catalog templates per provider

Cooldown guard fix: failed or cancelled operations no longer block re-scheduling; only a Completed operation triggers the cooldown window
Concurrent operation detection: reworked to distinguish between a superseded operation (own ID was cancelled by a newer run — silently skip) and a genuinely concurrent operation (different ID still running — emit Cancelled and return an error)
Bulk scheduling performance: BulkScheduleIngestionAsync now batches the status query for all connections into a single DB round-trip instead of N individual queries
Expiration check: connections with a past ExpirationDate now return InvalidOperationError("Connection expired") immediately instead of attempting to run

⚙️ Automation module overhaul#

The Policy module has been redesigned as the Automation module with a new versioned policy engine written in F#.

New policy type — Unused Volumes:

Detects and deletes orphaned (unattached) storage volumes on a schedule
Supports Azure Managed Disks, AWS EBS Volumes, and GCP Persistent Disks
Processes only volumes with no active compute attachment; attached volumes are always skipped

Policy engine changes:

Built-in policy definitions are now semantically versioned (SemVer); clients can pin to a specific version or always use latest
Policy instances are identified by UUID; three built-in policies ship with stable IDs (Start, Stop, Unused Volumes)
Enforce Tags policy type has been removed

Quota and credits system:

Each resource action (start, stop, or volume delete) now consumes one policy credit
Default quota: 100 policy instances and 10,000 monthly credits per tenant
Credits reset on the first of each month; balance is visible in Tenant → Quota
Executions that exceed the monthly credit limit are skipped with a CreditLimitReached status

Permission rename:

Old permission	New permission
`Policy.Policy / Read`	`Automation.PolicyInstance / Read`
`Policy.Policy / Write`	`Automation.PolicyInstance / Write`
`Policy.Policy / Delete`	`Automation.PolicyInstance / Delete`
`Policy.Execution / Read`	(merged into `Automation.PolicyInstance / Read`)

Attention

If you have assigned the Policy Contributor or Policy Reader role to users, those roles now grant the renamed Automation.PolicyInstance permissions. No re-assignment is needed — the roles are backward-compatible. However, any custom integrations or scripts that check for Policy.Policy permissions must be updated.

🔐 Permissions#

New permission resources added in v1.1.0:

Permission	Who needs it
`Monitoring.MonitoringDashboard / Read`	View dashboards, run widget queries
`Monitoring.MonitoringDashboard / Write`	Create, edit, import, export, set default dashboards
`Monitoring.AlertRule / Read`	View alert rules and alert history
`Monitoring.AlertRule / Write`	Create, edit, toggle, and evaluate alert rules
`Monitoring.AlertRule / Delete`	Delete alert rules
`Onboarding.Webhook / Read`	View webhook and Teams channels and delivery history
`Onboarding.Webhook / Write`	Create, edit, test, enable/disable, and delete webhooks

The built-in Monitoring Reader and Monitoring Contributor roles have been updated to include the Monitoring permissions automatically. The new Onboarding.Webhook permissions are included in the Owner and Contributor roles — no manual role updates required.

🐛 Bug Fixes#

Fixed scheduled alert evaluation failing silently in the Azure Functions background worker due to missing HTTP user context (background worker now passes tenantId directly, bypassing the HTTP user context requirement)
Fixed AlertList queries with 0 active alerts returning NoData instead of OK for AnyResult and threshold conditions, preventing automatic resolution
Fixed CreateDashboard with IsDefault = true leaving the tenant without a default dashboard when the insert step failed (now atomic via a database transaction)

📚 Documentation#

Monitoring Dashboards guide — completely rewritten for the native dashboard engine
New Alert Rules guide
New Webhooks guide — includes PagerDuty native integration
New Unused Volumes guide
Create Automation Policies updated for new policy engine and Unused Volumes type
CMP Agent guide expanded to cover all new domains: live monitoring queries, alert rule management, automation, reservations, and provisioning
Assess — Cost Overview updated with the new Cost Spike Detection configuration section
Roles & Permissions reference updated with Monitoring, Onboarding.Webhook, and renamed Automation.PolicyInstance resources