v1.1.0#
07/04/2026
Attention
Reply CMP v1.1.0 is now available v1.1.0 delivers the native Monitoring module — fully reimagined operational dashboards built on top of Azure Monitor, AWS CloudWatch, and GCP Cloud Monitoring APIs, alongside a complete Alert Rules engine with email and in-app notifications.
What’s Changed#
🚀 New Features#
Native Monitoring Dashboards#
The legacy Grafana-embedded monitoring experience has been replaced with a first-class native dashboard engine that queries cloud metrics and logs directly through provider APIs — no external dependency required.
Dashboards
Free-form 12-column grid canvas with drag-and-drop widget layout
Import / export dashboards as
.welkin-dashboardJSON files to share configurations across tenants or environmentsSet a default dashboard as the landing page when opening Monitoring
Per-tenant dashboard library with name, description, and timestamps
Widget types: Timeseries, Stat, Table, Alert List
Supported query types:
Provider |
Query types |
|---|---|
Azure |
Azure Monitor Metrics, Log Analytics (KQL), Azure Alerts Management |
AWS |
CloudWatch Metrics, CloudWatch Alarms, CloudWatch Logs Insights |
GCP |
Cloud Monitoring Metrics, Alert Policies |
Query features:
Multi-connection queries: target the same metric across multiple connections in a single widget
Configurable time range per widget (presets + custom absolute range)
Query results cached for 5 minutes for low-latency dashboard refresh
Auto-type-detection for Log Analytics workspaces; selectable workspaces for cross-workspace KQL
Dimension filtering for Azure Metrics; alignment period and reducer selection for GCP Metrics
Alert Rules#
Automated threshold monitoring across all three cloud providers.
Configure a query, an evaluation condition (Gt / Gte / Lt / Lte / Eq / Any Result), and a schedule (minimum 1-minute interval)
State machine: Unknown → OK → Firing → OK with automatic resolution
Evaluation targets: Last Value, Average, Count — reduce any query result to a single comparable number
On-demand evaluation: trigger an immediate evaluation from the rule detail page without waiting for the next scheduled run
Alert history: last 50 state transitions recorded per rule (Firing / Resolved / Error), with evaluated value and message
Quota: 50 rules per tenant
Notifications (per-rule):
In-app notification on fire and resolve (enabled by default)
Email notification with HTML and plain-text templates, fired-state colour coding, human-readable condition summary, and contextual next-steps guidance
Webhook and Microsoft Teams channels via registered Webhooks
Cooldown: per-rule 30-minute delivery cooldown prevents notification storms during sustained firing; spurious Unknown → OK resolved notifications are suppressed
Create an alert from a dashboard widget: use the widget context menu → “Create Alert” to pre-fill the query configuration automatically.
Webhooks#
A new Webhooks management section (Administration → Webhooks) lets you register outbound webhook endpoints and Microsoft Teams connectors that can be targeted by Alert Rules.
Supports three channel types:
Webhook — generic HTTPS POST with HMAC-SHA256 request signing
Microsoft Teams — Adaptive Card / MessageCard via channel connector
PagerDuty — native integration via Events API v2; incidents are opened on fire and automatically resolved when the rule returns to OK
Per-channel delivery history shows the last 200 delivery attempts with HTTP status codes and error messages
Test delivery sends a sample payload to verify connectivity before attaching a channel to a rule
Outbound URLs are validated for HTTPS and must not resolve to private IP ranges (SSRF protection)
Channels can be temporarily disabled without removing them from configured rules
Cost Spike Detection#
Automatic anomaly detection added to the FinOps ingestion pipeline.
After every cost refresh, the platform compares actual costs to ML-generated forecasts and raises a spike notification when both thresholds are exceeded simultaneously:
Percentage increase ≥ 25 % above forecast (configurable)
Absolute increase ≥ 2.00 (reporting currency) above forecast (configurable)
Group-level detection: when enabled, the engine checks spend within each allocation group and reports which groups crossed the threshold, not just the tenant total
Notification channels: in-app, email (with HTML template listing affected groups, actual vs forecast, and excess amounts), and registered Webhooks
Configuration: per-tenant settings at FinOps → Assess → Settings → Cost Spike Alerts — enable/disable, adjust thresholds, toggle group detection, and configure recipients
CMP Agent — expanded capabilities#
The CMP Agent’s MCP server has been substantially extended. New domains available from the agent panel:
Live Monitoring Queries —
query_metrics,query_logs,query_alerts,list_metric_namespaces,list_metrics,list_workspaces: run real-time metric, log, and alert queries directly against Azure Monitor, AWS CloudWatch, and GCP Cloud Monitoring APIs; supports filtering, aggregation, time binning, and workspace discoveryAlert Rule Management —
list_alert_rules,get_alert_rule,get_alert_rule_history,toggle_alert_rule,evaluate_alert_rule_now: inspect alert rule configurations and state history, enable/disable rules, and trigger on-demand evaluationsAutomation Policies —
list_policy_definitions,index_policy_instances,get_policy_instance_jobs: query available policy blueprints, active policy instances, and per-instance execution job history with credits usedCloud Reservations —
index_reservation_costs: payment lifecycle of Azure Reserved Instances, AWS Reserved Instances, and GCP Commitments with expected next payment datesProvisioning —
index_deployments,index_catalog_items: inspect provisioned infrastructure stacks and available catalog templates per provider
Cooldown guard fix: failed or cancelled operations no longer block re-scheduling; only a
Completedoperation triggers the cooldown windowConcurrent operation detection: reworked to distinguish between a superseded operation (own ID was cancelled by a newer run — silently skip) and a genuinely concurrent operation (different ID still running — emit Cancelled and return an error)
Bulk scheduling performance:
BulkScheduleIngestionAsyncnow batches the status query for all connections into a single DB round-trip instead of N individual queriesExpiration check: connections with a past
ExpirationDatenow returnInvalidOperationError("Connection expired")immediately instead of attempting to run
⚙️ Automation module overhaul#
The Policy module has been redesigned as the Automation module with a new versioned policy engine written in F#.
New policy type — Unused Volumes:
Detects and deletes orphaned (unattached) storage volumes on a schedule
Supports Azure Managed Disks, AWS EBS Volumes, and GCP Persistent Disks
Processes only volumes with no active compute attachment; attached volumes are always skipped
Policy engine changes:
Built-in policy definitions are now semantically versioned (SemVer); clients can pin to a specific version or always use latest
Policy instances are identified by UUID; three built-in policies ship with stable IDs (Start, Stop, Unused Volumes)
Enforce Tags policy type has been removed
Quota and credits system:
Each resource action (start, stop, or volume delete) now consumes one policy credit
Default quota: 100 policy instances and 10,000 monthly credits per tenant
Credits reset on the first of each month; balance is visible in Tenant → Quota
Executions that exceed the monthly credit limit are skipped with a
CreditLimitReachedstatus
Permission rename:
Old permission |
New permission |
|---|---|
|
|
|
|
|
|
|
(merged into |
Attention
If you have assigned the Policy Contributor or Policy Reader role to users, those roles now grant the renamed Automation.PolicyInstance permissions. No re-assignment is needed — the roles are backward-compatible. However, any custom integrations or scripts that check for Policy.Policy permissions must be updated.
🔐 Permissions#
New permission resources added in v1.1.0:
Permission |
Who needs it |
|---|---|
|
View dashboards, run widget queries |
|
Create, edit, import, export, set default dashboards |
|
View alert rules and alert history |
|
Create, edit, toggle, and evaluate alert rules |
|
Delete alert rules |
|
View webhook and Teams channels and delivery history |
|
Create, edit, test, enable/disable, and delete webhooks |
The built-in Monitoring Reader and Monitoring Contributor roles have been updated to include the Monitoring permissions automatically. The new Onboarding.Webhook permissions are included in the Owner and Contributor roles — no manual role updates required.
🐛 Bug Fixes#
Fixed scheduled alert evaluation failing silently in the Azure Functions background worker due to missing HTTP user context (background worker now passes
tenantIddirectly, bypassing the HTTP user context requirement)Fixed
AlertListqueries with 0 active alerts returningNoDatainstead ofOKforAnyResultand threshold conditions, preventing automatic resolutionFixed
CreateDashboardwithIsDefault = trueleaving the tenant without a default dashboard when the insert step failed (now atomic via a database transaction)
📚 Documentation#
Monitoring Dashboards guide — completely rewritten for the native dashboard engine
New Webhooks guide — includes PagerDuty native integration
Create Automation Policies updated for new policy engine and Unused Volumes type
CMP Agent guide expanded to cover all new domains: live monitoring queries, alert rule management, automation, reservations, and provisioning
Assess — Cost Overview updated with the new Cost Spike Detection configuration section
Roles & Permissions reference updated with Monitoring, Onboarding.Webhook, and renamed Automation.PolicyInstance resources