Troubleshooting
Practical problem/solution guide for Pragmatic.Resilience. Each section covers a common issue, the likely causes, and the fix.
Retries Not Happening
Section titled “Retries Not Happening”Your pipeline has retry configured, but exceptions are not retried — they propagate immediately.
Checklist
Section titled “Checklist”-
Is
MaxRetriesset to 0? A value of 0 disables retries entirely. The default is 3. -
Is
ShouldRetryfiltering out the exception? If you configured aShouldRetrypredicate, verify it returnstruefor the exception type you are seeing:o.Retry = new RetryOptions{ShouldRetry = ex => ex is HttpRequestException // Only retries HttpRequestException};An
IOExceptionorTimeoutRejectedExceptionwould not be retried with this filter. -
Is the exception
OperationCanceledExceptionfrom external cancellation? Retry explicitly does not retry cancellation from the caller’sCancellationToken. This is by design — the caller asked to cancel. -
Is the operation returning a
Resultfailure instead of throwing? Resilience strategies only trigger on exceptions. If your code returnsResult.Failure(...)instead of throwing, the pipeline sees a successful execution. This is by design for business errors. -
Is the pipeline actually applied? When using
[ResiliencePolicy("name")], verify the policy name matches a configured policy. Unknown names resolve toPassthroughPipeline, which does nothing.
Circuit Breaker Never Opens
Section titled “Circuit Breaker Never Opens”The downstream service is clearly failing, but the circuit stays closed and requests keep going through.
Checklist
Section titled “Checklist”-
Are you using a shared state store? If using the fluent builder, ensure the
InMemoryCircuitBreakerStateStoreinstance is shared across calls (static field, DI singleton). A new instance per call has no accumulated failure history. -
Is
ShouldHandlefiltering out the failures? If the circuit breaker’sShouldHandlepredicate does not match the exception type, failures are not counted:o.CircuitBreaker = new CircuitBreakerOptions{ShouldHandle = ex => ex is HttpRequestException // IOException not counted}; -
Is
FailureThresholdtoo high? The default is 5 consecutive failures. If errors are intermixed with successes, the consecutive count resets on each success and never reaches the threshold. -
Is
OperationCanceledExceptionbeing thrown? External cancellation (caller’s token) is not counted as a circuit breaker failure. This is by design — a cancelled request does not indicate a downstream failure. -
Are multiple circuits being created for the same service? The circuit key defaults to
OperationKey ?? OperationName. If two operations call the same service but use different operation names (and no sharedOperationKey), they maintain separate circuits. SetOperationKeyto the same value for operations that should share a circuit:new ResilienceContext { OperationName = "GetUser", OperationKey = "user-service" }new ResilienceContext { OperationName = "UpdateUser", OperationKey = "user-service" }
Circuit Breaker Opens Too Aggressively
Section titled “Circuit Breaker Opens Too Aggressively”The circuit opens after what seems like a small number of failures.
Possible Causes
Section titled “Possible Causes”Low failure threshold. With FailureThreshold = 1, a single transient exception opens the circuit. Increase to 3-5 for typical external services.
Long break duration. With BreakDuration = TimeSpan.FromMinutes(5), one bad period locks out the service for 5 minutes. Start with 15-30 seconds and adjust based on observed recovery times.
No ShouldHandle filter. Without a filter, every exception counts — including ArgumentException, InvalidOperationException, or other programming errors that are not transient.
Timeout Fires But Operation Continues Running
Section titled “Timeout Fires But Operation Continues Running”After a TimeoutRejectedException, the operation keeps running in the background (visible via logs or resource usage).
You are using TimeoutType.Pessimistic. Pessimistic timeout races Task.Delay against the operation via Task.WhenAny. When the delay wins, the pipeline throws TimeoutRejectedException, but the operation task is not truly cancelled — it continues until it completes or the process exits.
Switch to TimeoutType.Optimistic (the default). Optimistic timeout creates a linked CancellationToken and cancels it after the duration. Operations that honor cancellation tokens will cancel cooperatively.
Use pessimistic timeout only for operations that genuinely do not support cancellation (legacy synchronous code, third-party libraries that ignore tokens).
Pipeline Returns PassthroughPipeline (No Resilience Applied)
Section titled “Pipeline Returns PassthroughPipeline (No Resilience Applied)”GetPipeline("name") returns PassthroughPipeline.Instance, and your operation runs without any resilience wrapping.
Checklist
Section titled “Checklist”-
Does the policy name exist in configuration? Check
appsettings.jsonunderResilience:Policies. Policy names are case-insensitive. -
Is
AddPragmaticResilience()called? Without DI registration, there is noIResiliencePipelineProviderand no configuration binding. -
Is the configuration section named correctly? The root key is
"Resilience", not"ResilienceOptions"or"Pragmatic:Resilience":{"Resilience": {"Policies": { "my-policy": { ... } }}} -
Is there a default policy? If no named policy matches and
Resilience:Defaultis not set, the provider returnsPassthroughPipeline. Set a default for a baseline timeout:{"Resilience": {"Default": {"Timeout": { "Timeout": "00:00:30" }}}} -
Was a fluent override cleared?
AddPolicy()on the provider takes precedence over configuration. If a fluent override was registered and then removed, the cache may still hold the old pipeline. Fluent overrides invalidate the cache, but race conditions during startup could cause stale entries.
BulkheadRejectedException When Load Is Low
Section titled “BulkheadRejectedException When Load Is Low”Requests are rejected by the bulkhead even though overall load is light.
Possible Causes
Section titled “Possible Causes”MaxConcurrency is too low. The default is 10. If your operation takes 1 second and you have 15 concurrent users, 5 requests are rejected. Increase MaxConcurrency to match your expected concurrency.
Queue is disabled. With MaxQueuedActions = 0 (the default), any request that cannot acquire a semaphore slot immediately is rejected. Set MaxQueuedActions > 0 and QueueTimeout > TimeSpan.Zero to queue overflow.
Semaphore leak. If an operation throws before the finally block releases the semaphore (should not happen with the built-in strategy, but possible with custom strategies), slots are leaked and never recovered. The BulkheadStrategy uses try/finally to prevent this.
Shared bulkhead across unrelated operations. If multiple operations use the same policy name but have different concurrency requirements, they share the same SemaphoreSlim. A burst of requests to one operation starves the other.
[ResiliencePolicy] Attribute Has No Effect
Section titled “[ResiliencePolicy] Attribute Has No Effect”You annotated a DomainAction with [ResiliencePolicy("name")], but the action runs without resilience.
Checklist
Section titled “Checklist”-
Is
Pragmatic.Resiliencereferenced in the project? The source generator detectsPragmatic.ResilienceviaFeatureDetector. Without the package reference, the SG does not generate resilience wrapping. -
Is the source generator referenced? Verify your
.csprojhas:<ProjectReference Include="...\Pragmatic.SourceGenerator.csproj"OutputItemType="Analyzer"ReferenceOutputAssembly="false" /> -
Is the class
partial? The SG generates code into a partial class. Withoutpartial, the generated code cannot merge. -
Is the class a
DomainActionorMutation? The[ResiliencePolicy]attribute only has effect on classes that the SG processes as actions. A plain class with[ResiliencePolicy]is ignored. -
Does the policy name exist? If the name does not match any configured policy and there is no default, the pipeline is
PassthroughPipeline(no-op). Check the logs for the operation name — if no retry/timeout log messages appear, the pipeline is passthrough.
Configuration Not Binding from appsettings.json
Section titled “Configuration Not Binding from appsettings.json”Policies defined in appsettings.json are ignored, and all pipelines use defaults or passthrough.
Checklist
Section titled “Checklist”-
Is the JSON path correct? The configuration binds from
"Resilience"at the root level:{"Resilience": {"Policies": {"my-policy": { ... }}}}Not
"Pragmatic:Resilience"or"ResilienceOptions". -
Are TimeSpan values formatted correctly? Use the
"hh:mm:ss"or"hh:mm:ss.fff"format:"Timeout": "00:00:10" // 10 seconds"BaseDelay": "00:00:00.200" // 200 milliseconds"BreakDuration": "00:00:30" // 30 secondsNot
"10s"or"200ms"— those are not validTimeSpanformats for JSON deserialization. -
Is
AddPragmaticResilience()called? This registers theIOptions<ResilienceOptions>binding. Without it, the configuration section is never read. -
Are enum values spelled correctly?
BackoffTypevalues are"Constant","Linear","Exponential".TimeoutTypevalues are"Optimistic","Pessimistic". These are case-insensitive.
Can I use Pragmatic.Resilience without DomainActions?
Section titled “Can I use Pragmatic.Resilience without DomainActions?”Yes. Inject IResiliencePipelineProvider into any service and call GetPipeline("name"). You can also use ResiliencePipelineBuilder directly without DI at all.
Can I share a circuit breaker across multiple operations?
Section titled “Can I share a circuit breaker across multiple operations?”Yes. Set OperationKey on the ResilienceContext to the same value for operations that call the same downstream service. The circuit key defaults to OperationKey ?? OperationName.
Does the retry strategy retry on OperationCanceledException?
Section titled “Does the retry strategy retry on OperationCanceledException?”No. If the caller’s CancellationToken is cancelled, the retry strategy propagates the exception immediately. This is by design — the caller explicitly asked to cancel.
Can I use Fallback with DI configuration?
Section titled “Can I use Fallback with DI configuration?”No. FallbackStrategy<TResult> requires a typed factory delegate (Func<Exception, ResilienceContext, CancellationToken, Task<TResult>>), which cannot be expressed in JSON. Use the fluent builder (ResiliencePipelineBuilder.AddStrategy(new FallbackStrategy<T>(...))) for fallback.
What happens if I reference a policy name that does not exist?
Section titled “What happens if I reference a policy name that does not exist?”GetPipeline("nonexistent") returns PassthroughPipeline.Instance — a no-op singleton that executes the operation directly with zero overhead. No exception, no warning, no logging. This is intentional to allow gradual adoption: annotate actions with [ResiliencePolicy("name")] before configuring the policy. The action runs unprotected until the policy is configured.
How do I test my resilience configuration?
Section titled “How do I test my resilience configuration?”Build the pipeline and execute a failing operation:
var stateStore = new InMemoryCircuitBreakerStateStore();var pipeline = new ResiliencePipelineBuilder() .AddRetry(o => { o.MaxRetries = 2; o.BaseDelay = TimeSpan.FromMilliseconds(10); }) .Build();
var attempt = 0;await Assert.ThrowsAsync<RetryExhaustedException>(() => pipeline.ExecuteAsync<string>((ctx, ct) => { attempt++; throw new HttpRequestException("Simulated failure"); }, new ResilienceContext { OperationName = "Test" }));
Assert.Equal(3, attempt); // 1 initial + 2 retriesHow do I implement a distributed circuit breaker?
Section titled “How do I implement a distributed circuit breaker?”Implement ICircuitBreakerStateStore with a Redis or database backend. Register it before AddPragmaticResilience():
services.AddSingleton<ICircuitBreakerStateStore, RedisCircuitBreakerStateStore>();services.AddPragmaticResilience(); // Uses TryAddSingleton, so your registration winsThe interface methods (GetSnapshotAsync, RecordSuccessAsync, RecordFailureAsync, TransitionToAsync) are all async to support distributed stores.
Getting Help
Section titled “Getting Help”- GitHub Issues: github.com/nicola-pragmatic/Pragmatic.Design/issues
- Concepts Guide: See concepts.md for architecture and strategy composition.
- Policy Reference: See policies.md for all strategy options and examples.
- Common Mistakes: See common-mistakes.md for frequent errors with Wrong/Right patterns.