Skip to content

Troubleshooting — Pragmatic.ControlPlane

FAQ, connection issues, stale hosts, health reporting, and recovery procedures.


Q: Do I need the control plane for a single-instance deployment?

Section titled “Q: Do I need the control plane for a single-instance deployment?”

No. All interfaces (IHostIdentity, IHostStatus, IControlPlane) have NoOp defaults registered by Pragmatic.Composition.Host. A monolith works without any UseControlPlane() call.

Q: Can the hub run in the same process as the application?

Section titled “Q: Can the hub run in the same process as the application?”

Yes. This is the “embedded hub” mode. Add both Pragmatic.ControlPlane.SignalR and Pragmatic.ControlPlane.Client to the same host. The ControlPlaneConnectionService defers connection until after Kestrel starts listening, avoiding self-connection deadlocks.

Clients degrade to NoOp behavior. Each host continues serving requests independently. When the hub comes back, clients auto-reconnect (exponential backoff) and re-register. No data is lost — the hub is stateless (in-memory registry rebuilt from re-registrations).

Q: Can I query connected hosts without SignalR?

Section titled “Q: Can I query connected hosts without SignalR?”

Yes. The hub exposes REST fallback endpoints:

Terminal window
# All hosts
curl https://control-plane:5100/_pragmatic/control-plane/hosts
# Specific host
curl https://control-plane:5100/_pragmatic/control-plane/hosts/{hostId}
# Audit trail
curl https://control-plane:5100/_pragmatic/control-plane/audit

Q: How do I push maintenance mode to all hosts?

Section titled “Q: How do I push maintenance mode to all hosts?”

Send a MaintenanceCommand via the hub:

var error = await controlPlane.SendCommandAsync(
targetHostId: "*", // broadcast
new MaintenanceCommand(Enable: true),
ct);

Or use the hub method directly (server-side):

await hub.Clients.All.OnCommand("admin", "MaintenanceCommand",
JsonSerializer.Serialize(new MaintenanceCommand(Enable: true)));

Q: How do I scale the hub for high availability?

Section titled “Q: How do I scale the hub for high availability?”

Add a SignalR backplane:

builder.Services.AddSignalR()
.AddStackExchangeRedis("redis:6379");

This syncs the registry across hub instances. Without a backplane, each hub instance has its own isolated registry.


Symptom: Failed to connect to control plane at {HubUrl}, operating in degraded mode

Checklist:

#CheckHow
1Hub is runningcurl https://hub-url/_pragmatic/control-plane/hosts
2URL is correctVerify HubUrl in appsettings.json — must include full path /_pragmatic/control-plane
3API key matchesCompare hub and client ApiKey values
4Network reachableping hub-host or telnet hub-host 5100
5TLS certificate validCheck for certificate errors in logs
6Firewall allows portVerify port 5100 (or configured port) is open

Symptom: Logs show repeated Control plane connection lost, reconnecting... followed by Reconnected to control plane, re-registering...

Causes:

  1. Network instability — Intermittent connectivity between client and hub
  2. Hub under load — Too many connected hosts overwhelming the SignalR hub
  3. Keep-alive timeout — SignalR keep-alive not matching infrastructure timeouts (load balancers, proxies)

Fix: Increase the heartbeat interval to reduce hub load:

cp.WithHeartbeatInterval(TimeSpan.FromSeconds(60));

For load balancers, ensure WebSocket connections are not terminated prematurely. Configure idle timeout to at least 2x the heartbeat interval.

Symptom: Control plane connection closed permanently, operating in degraded mode

Cause: The client exhausted all reconnection attempts (MaxReconnectAttempts).

Fix: Increase the max attempts or set to a higher value:

cp.WithMaxReconnectAttempts(20); // Default is 10

The exponential backoff caps at 30 seconds per attempt:

Attempt 1: 1s, Attempt 2: 2s, Attempt 3: 4s, ... Attempt 5+: 30s

Hosts showing as connected but actually down

Section titled “Hosts showing as connected but actually down”

Symptom: GET /_pragmatic/control-plane/hosts shows a host with an old LastHeartbeat timestamp, but the host process is dead.

Cause: The stale eviction interval has not elapsed yet.

Fix: Reduce the stale eviction interval on the hub:

hub.WithStaleEvictionInterval(TimeSpan.FromMinutes(1));

After the interval elapses, StaleHostEvictionService removes the host and broadcasts OnHostDisconnected.

Hosts repeatedly appearing and disappearing

Section titled “Hosts repeatedly appearing and disappearing”

Symptom: A host shows up, then is evicted, then re-registers, in a loop.

Causes:

  1. Heartbeat not reaching hub — Network issues between specific client and hub
  2. Eviction interval too short — See Common Mistake #4
  3. Client GC pauses — Long GC pauses cause heartbeat to be delayed past the eviction threshold

Diagnosis: Compare HeartbeatInterval (client) with StaleEvictionInterval (hub). The eviction interval should be at least 4x the heartbeat interval.


Cause: No IHostHealthContributor implementations are registered. Without contributors, the aggregator defaults to Healthy.

Fix: Register health contributors for critical dependencies:

services.AddSingleton<IHostHealthContributor, DatabaseHealthContributor>();

Symptom: Health check fails with an unhandled exception from a contributor.

Cause: The contributor’s CheckAsync method throws instead of returning Unhealthy.

Fix: Always catch exceptions in health contributors:

public async Task<HealthContribution> CheckAsync(CancellationToken ct)
{
try
{
// Check dependency...
return new HealthContribution(Name, ContributorHealthStatus.Healthy);
}
catch (Exception ex)
{
return new HealthContribution(Name, ContributorHealthStatus.Unhealthy, ex.Message);
}
}

The HostHealthAggregator does catch contributor exceptions, but a clean return is preferred.


Symptom: SendCommandAsync returns null (success) but the target host does not execute the command.

Checklist:

  1. Target host connected? — Check GetAllHostsAsync() for the target HostId
  2. Handler registered? — Verify IHostCommandHandler<T> is in DI on the target host
  3. Command type matches? — The type name must match between sender and receiver
  4. Dispatcher present?IHostCommandDispatcher must be registered (automatic with Composition.Host)

“No IHostCommandDispatcher registered” in logs

Section titled ““No IHostCommandDispatcher registered” in logs”

Cause: The HostCommandDispatcher is registered by Pragmatic.Composition.Host. If the host does not reference Composition.Host, commands are silently ignored.

Fix: Ensure the host project references Pragmatic.Composition.Host (it should, for PragmaticApp.RunAsync).

Command audit shows “Host not connected”

Section titled “Command audit shows “Host not connected””

Cause: The target host ID does not match any registered host.

Fix: Use the correct HostId. Query GET /_pragmatic/control-plane/hosts to find the current host IDs. Remember: HostId is regenerated on every restart (Guid7).


Two hosts both claiming migration leadership

Section titled “Two hosts both claiming migration leadership”

Symptom: Both hosts log “Migration leadership claimed for {Database} by {HostId}”.

Cause: If the hub is not used for migration coordination, DatabaseLeaderElection operates independently. Two hosts connecting to the same database should be coordinated by the __PragmaticLock table.

Diagnosis: Check if both hosts are using the same connection string. Different connection strings (even to the same database) may not share the lock table.

Migration leadership not released after completion

Section titled “Migration leadership not released after completion”

Symptom: New hosts cannot claim migration leadership. The hub shows a leader but that host is no longer migrating.

Cause: The host completed migrations but did not call ReleaseMigrationLeadership. This can happen if the host crashes between migration completion and release.

Fix for hub-based leadership: The ControlPlaneHub.OnDisconnectedAsync handler calls CleanupMigrationLeadership automatically when a host disconnects. If the host is still connected but stuck, restart it.

Fix for DatabaseLeaderElection: The lock expires after the configured timeout (default: 5 minutes). Wait for expiry, or manually release:

UPDATE "__PragmaticLock"
SET "HolderId" = NULL, "AcquiredAt" = NULL, "ExpiresAt" = NULL
WHERE "LockName" = 'migration';

When the hub restarts:

  1. All clients detect disconnection and start reconnecting
  2. On reconnect, each client re-registers its HostInfo
  3. The registry is rebuilt from scratch (stateless hub)
  4. Migration leadership state is lost — but DatabaseLeaderElection provides the safety net
  5. Command audit log is lost (in-memory)

For persistent audit, consider forwarding audit entries to an external store.

When a client host restarts:

  1. The hub detects disconnection via OnDisconnectedAsync
  2. Migration leadership held by that host is released
  3. Other hosts are notified via OnHostDisconnected
  4. When the host starts again, it gets a new HostId and re-registers

When all hosts restart simultaneously:

  1. The hub starts first (if dedicated) or with the primary host (if embedded)
  2. Clients connect after ApplicationStarted
  3. Migration leadership is contested — DatabaseLeaderElection ensures only one wins
  4. After migrations complete, all hosts transition to Serving

#CheckHow
1Hub is runningcurl https://hub/_pragmatic/control-plane/hosts
2Client is connectedCheck IControlPlane.IsConnected or logs
3API keys matchCompare hub and client configuration
4Heartbeats flowingCheck LastHeartbeat in host list
5Stale eviction configuredHub StaleEvictionInterval >= 4x client HeartbeatInterval
6Health contributors registeredCheck DI for IHostHealthContributor
7Commands dispatchedCheck /_pragmatic/control-plane/audit
8Migration leadership cleanCheck hub registry or __PragmaticLock table

For detailed control plane diagnostics:

{
"Logging": {
"LogLevel": {
"Pragmatic.ControlPlane": "Debug",
"Pragmatic.ControlPlane.Client": "Debug",
"Pragmatic.ControlPlane.SignalR": "Debug",
"Microsoft.AspNetCore.SignalR": "Debug"
}
}
}

At Debug level, the client logs every heartbeat attempt, reconnection, and command dispatch. The hub logs every registration, state change, and stale eviction check.