Pragmatic.Resilience
Native, AOT-safe resilience library with strategy composition, fluent builder, DI integration, and source generator support.
The Problem
Section titled “The Problem”Distributed systems fail. HTTP calls time out, databases go down, third-party APIs return errors. Without resilience patterns, every failure propagates immediately to the user. The standard approach — wrapping calls in Polly policies or hand-rolling try/catch with retry loops — scatters resilience logic across the codebase, pulls in external dependencies, and treats every failure the same regardless of whether it is transient (network blip) or permanent (validation error).
// Without Pragmatic: manual resilience for every external callvar retryPolicy = Policy.Handle<HttpRequestException>() .WaitAndRetryAsync(3, attempt => TimeSpan.FromMilliseconds(200 * Math.Pow(2, attempt)));var circuitBreaker = Policy.Handle<HttpRequestException>() .CircuitBreakerAsync(5, TimeSpan.FromSeconds(30));var timeout = Policy.TimeoutAsync(10);var combined = Policy.WrapAsync(timeout, circuitBreaker, retryPolicy);
// Must repeat for every service, every action, every call siteawait combined.ExecuteAsync(ct => http.PostAsJsonAsync("/charges", request, ct), ct);The Solution
Section titled “The Solution”Pragmatic.Resilience inverts the model. You declare a policy name, the framework composes the pipeline. One attribute, five strategies, zero manual wiring.
// With Pragmatic: declare the policy, the SG wires the pipeline[DomainAction][ResiliencePolicy("payment-gateway")]public partial class ChargeCustomerAction : DomainAction<PaymentResult>{ public override async Task<Result<PaymentResult, IError>> Execute(CancellationToken ct) { // Wrapped by the "payment-gateway" pipeline automatically. // Exceptions trigger retry + circuit breaker. // Result failures (validation, not found) pass through unchanged. var response = await http.PostAsJsonAsync("/charges", request, ct); response.EnsureSuccessStatusCode(); return await response.Content.ReadFromJsonAsync<PaymentResult>(ct); }}The policy is defined once in configuration — no code changes needed to tune retry counts, timeouts, or circuit breaker thresholds:
{ "Resilience": { "Policies": { "payment-gateway": { "Timeout": { "Timeout": "00:00:10" }, "Retry": { "MaxRetries": 3, "BackoffType": "Exponential", "BaseDelay": "00:00:00.200" }, "CircuitBreaker": { "FailureThreshold": 5, "BreakDuration": "00:00:30" } } } }}Native, AOT-safe, zero external dependencies. Only exceptions trigger resilience strategies — Result<T, E> failures are business errors and pass through unchanged. Unknown policy names resolve to PassthroughPipeline with zero overhead.
Installation
Section titled “Installation”dotnet add package Pragmatic.ResilienceFor source generator integration with [ResiliencePolicy]:
<ProjectReference Include="..\Pragmatic.SourceGenerator\src\Pragmatic.SourceGenerator\Pragmatic.SourceGenerator.csproj" OutputItemType="Analyzer" ReferenceOutputAssembly="false" />Feature Catalog
Section titled “Feature Catalog”| Problem | Solution |
|---|---|
| Transient failures in external calls | RetryStrategy with configurable backoff and jitter |
| Operations hanging indefinitely | TimeoutStrategy with optimistic or pessimistic cancellation |
| Cascading failures from unhealthy dependencies | CircuitBreakerStrategy with pluggable state store |
| Resource exhaustion from unbounded concurrency | BulkheadStrategy with SemaphoreSlim-based limiter |
| Hard failures that need a graceful degradation path | FallbackStrategy<TResult> with alternative value factory |
| Manual pipeline wiring per action | [ResiliencePolicy("name")] attribute + source generator |
| Configuration scattered across code | Named policies in appsettings.json with IOptions<T> binding |
| No observability into resilience behavior | Built-in ActivitySource, Meter instruments, and [LoggerMessage] logging |
Quick Start
Section titled “Quick Start”Fluent Builder (No DI)
Section titled “Fluent Builder (No DI)”var stateStore = new InMemoryCircuitBreakerStateStore();
var pipeline = new ResiliencePipelineBuilder() .AddRetry(o => { o.MaxRetries = 3; o.BackoffType = BackoffType.Exponential; }) .AddTimeout(o => o.Timeout = TimeSpan.FromSeconds(5)) .AddCircuitBreaker(stateStore, o => { o.FailureThreshold = 5; o.BreakDuration = TimeSpan.FromSeconds(30); }) .Build();
var result = await pipeline.ExecuteAsync( (ctx, ct) => httpClient.GetStringAsync(url, ct), new ResilienceContext { OperationName = "FetchData" });DI Integration
Section titled “DI Integration”services.AddPragmaticResilience(options =>{ options.Policies["external-api"] = new ResiliencePolicyOptions { Timeout = new() { Timeout = TimeSpan.FromSeconds(10) }, Retry = new() { MaxRetries = 3, BaseDelay = TimeSpan.FromMilliseconds(200) }, CircuitBreaker = new() { FailureThreshold = 5, BreakDuration = TimeSpan.FromSeconds(30) } };});Resolve and use a named pipeline at runtime:
public class ExternalApiClient(IResiliencePipelineProvider pipelines){ public async Task<string> FetchAsync(string url, CancellationToken ct) { var pipeline = pipelines.GetPipeline("external-api");
return await pipeline.ExecuteAsync( (ctx, token) => httpClient.GetStringAsync(url, token), new ResilienceContext { OperationName = "FetchData" }, ct); }}You can also register named policies individually:
services.AddPragmaticResilience();
services.AddResiliencePolicy("external-api", o =>{ o.Timeout = new() { Timeout = TimeSpan.FromSeconds(10) }; o.Retry = new() { MaxRetries = 3 };});Source Generator
Section titled “Source Generator”Annotate a DomainAction with [ResiliencePolicy] to automatically wrap execution with the named pipeline:
[DomainAction][ResiliencePolicy("external-api")]public partial class FetchUserAction : DomainAction<UserDto>{ public override async Task<Result<UserDto, IError>> Execute(CancellationToken ct) { // This execution is wrapped by the "external-api" resilience pipeline. // Exceptions trigger retry/circuit breaker; Result failures pass through. var response = await httpClient.GetAsync("/users/123", ct); // ... }}Strategies
Section titled “Strategies”Strategies are ordered by their Order value (ascending). Lower order = more external, wrapping more of the pipeline:
| Strategy | Order | Purpose | Default | Exception |
|---|---|---|---|---|
| Timeout | 100 | Cancel if total time exceeded | 30s, Optimistic | TimeoutRejectedException |
| Bulkhead | 200 | Limit concurrent executions | MaxConcurrency=10, MaxQueued=0 | BulkheadRejectedException |
| Circuit Breaker | 300 | Reject fast if service is unhealthy | Threshold=5, Break=30s | CircuitBrokenException |
| Retry | 400 | Retry on transient failures | MaxRetries=3, BaseDelay=200ms, Exponential | RetryExhaustedException |
| Fallback | 500 | Provide alternative value on failure | — | — |
Execution Order
Section titled “Execution Order”Request | vTimeout (Order 100) -----> cancels if total time exceeded | vBulkhead (Order 200) ----> rejects if max concurrency reached | vCircuitBreaker (Order 300) -> rejects if circuit is open | vRetry (Order 400) -------> retries on transient exception | vFallback (Order 500) ----> catches exception, returns alternative | vOperationRetry Strategy
Section titled “Retry Strategy”Retries on exceptions with configurable backoff and jitter.
| Option | Default | Description |
|---|---|---|
MaxRetries | 3 | Maximum retry attempts (0 = no retries) |
BaseDelay | 200ms | Base delay between retries |
BackoffType | Exponential | Constant, Linear, or Exponential |
MaxDelay | 30s | Upper bound on delay (prevents unbounded growth) |
UseJitter | true | Decorrelated jitter (AWS recommendation) to prevent thundering herd |
ShouldRetry | null (all) | Predicate to filter which exceptions trigger retry |
Backoff formulas:
- Constant:
baseDelay - Linear:
baseDelay * (attempt + 1) - Exponential:
baseDelay * 2^attempt
When UseJitter is enabled, the computed delay is multiplied by a random factor in [0.5, 1.5) (decorrelated jitter).
Timeout Strategy
Section titled “Timeout Strategy”Cancels the operation if it exceeds the configured duration.
| Option | Default | Description |
|---|---|---|
Timeout | 30s | Maximum allowed duration |
TimeoutType | Optimistic | Cancellation approach |
Timeout types:
- Optimistic — Creates a linked
CancellationTokenand cancels it after the timeout. Preferred for operations that honor cancellation. - Pessimistic — Races
Task.Delayagainst the operation viaTask.WhenAny. For operations that do not honor cancellation. The operation may continue running in the background.
Circuit Breaker Strategy
Section titled “Circuit Breaker Strategy”Opens after consecutive failures, rejects requests while open, allows a probe after the break duration elapses.
| Option | Default | Description |
|---|---|---|
FailureThreshold | 5 | Consecutive failures before opening |
BreakDuration | 30s | How long the circuit stays open |
ShouldHandle | null (all) | Predicate to filter which exceptions count as failures |
State machine:
Closed ---[threshold failures]--> Open ---[break elapsed]--> HalfOpen ^ | | | +----[probe succeeds]----<------<------<------<------<-------+ | Open <----[probe fails]----<------<------<------<------<-----+The state store is pluggable via ICircuitBreakerStateStore. The default InMemoryCircuitBreakerStateStore is a thread-safe, per-process singleton. For distributed scenarios, implement the interface with Redis or a database backend.
Bulkhead Strategy
Section titled “Bulkhead Strategy”Limits concurrent executions using SemaphoreSlim.
| Option | Default | Description |
|---|---|---|
MaxConcurrency | 10 | Maximum concurrent executions |
MaxQueuedActions | 0 | Overflow queue size (0 = no queue) |
QueueTimeout | TimeSpan.Zero | Maximum wait time in queue |
When all slots are taken and the queue is full (or disabled), the request is rejected immediately with BulkheadRejectedException.
Fallback Strategy
Section titled “Fallback Strategy”Catches exceptions and provides an alternative result. This is a generic strategy (FallbackStrategy<TResult>) that only activates when the result type matches.
var fallbackOptions = new FallbackOptions<UserDto>{ FallbackAction = (ex, ctx, ct) => Task.FromResult(UserDto.Default), ShouldHandle = ex => ex is HttpRequestException, OnFallback = (ex, ctx) => logger.LogWarning("Using fallback for {Op}", ctx.OperationName)};
builder.AddStrategy(new FallbackStrategy<UserDto>(fallbackOptions));| Option | Description |
|---|---|
FallbackAction | Factory that produces the fallback value (required) |
ShouldHandle | Predicate to filter which exceptions trigger the fallback |
OnFallback | Callback invoked when fallback is used (for logging/metrics) |
Custom Strategies
Section titled “Custom Strategies”Implement IResilienceStrategy and add it to the pipeline:
public class RateLimitStrategy : IResilienceStrategy{ public int Order => 150; // Between Timeout and Bulkhead
public async Task<TResult> ExecuteAsync<TResult>( Func<ResilienceContext, CancellationToken, Task<TResult>> next, ResilienceContext context, CancellationToken ct) { // Your logic here return await next(context, ct); }}
var pipeline = new ResiliencePipelineBuilder() .AddStrategy(new RateLimitStrategy()) .AddRetry() .Build();Configuration
Section titled “Configuration”appsettings.json
Section titled “appsettings.json”{ "Resilience": { "Default": { "Timeout": { "Timeout": "00:00:30", "TimeoutType": "Optimistic" } }, "Policies": { "external-api": { "Timeout": { "Timeout": "00:00:10", "TimeoutType": "Optimistic" }, "Retry": { "MaxRetries": 3, "BaseDelay": "00:00:00.200", "BackoffType": "Exponential", "MaxDelay": "00:00:30", "UseJitter": true }, "CircuitBreaker": { "FailureThreshold": 5, "BreakDuration": "00:00:30" } }, "database": { "Timeout": { "Timeout": "00:00:05" }, "Retry": { "MaxRetries": 2, "BackoffType": "Constant", "BaseDelay": "00:00:00.100" } } } }}Resolution Order
Section titled “Resolution Order”When IResiliencePipelineProvider.GetPipeline(name) is called:
- Fluent overrides — policies registered via
AddPolicy()on the provider - Configuration — policies from
ResilienceOptions.Policiesdictionary - Default —
ResilienceOptions.Defaultif set - Passthrough —
PassthroughPipeline.Instance(zero overhead, no wrapping)
Error Types
Section titled “Error Types”Resilience errors implement Pragmatic.Result.Error for integration with the Result pattern:
| Error | Code | HTTP Status | When |
|---|---|---|---|
TimeoutError | TIMEOUT | 504 | Operation exceeded timeout duration |
RetryExhaustedError | RETRY_EXHAUSTED | 503 | All retry attempts failed |
CircuitBrokenError | CIRCUIT_BROKEN | 503 | Circuit is open, requests rejected |
BulkheadRejectedError | BULKHEAD_REJECTED | 429 | Max concurrency exceeded |
Each strategy also throws a corresponding exception (TimeoutRejectedException, RetryExhaustedException, CircuitBrokenException, BulkheadRejectedException) for pipeline-level control flow. The error records are for mapping to Result<T, E> at the action/endpoint layer.
Observability
Section titled “Observability”Distributed Tracing
Section titled “Distributed Tracing”ActivitySource: "Pragmatic.Resilience"
Each pipeline execution creates an activity Resilience.{policyName} with tags:
policy.name— the resolved policy nameoutcome—success,retry, orexceptionattempt— retry attempt number (if retried)
Metrics
Section titled “Metrics”Meter: "Pragmatic.Resilience"
| Instrument | Type | Name | Description |
|---|---|---|---|
| Pipeline duration | Histogram | pragmatic.resilience.duration | Execution duration in ms |
| Pipeline executions | Counter | pragmatic.resilience.executions | Total pipeline executions |
| Retry attempts | Counter | pragmatic.resilience.retry_attempts | Total retry attempts |
| Circuit rejections | Counter | pragmatic.resilience.circuit_rejections | Requests rejected by open circuits |
| Timeouts | Counter | pragmatic.resilience.timeouts | Total timeout occurrences |
| Bulkhead rejections | Counter | pragmatic.resilience.bulkhead_rejections | Requests rejected by bulkhead |
Structured Logging
Section titled “Structured Logging”All log messages use [LoggerMessage] source-generated partial methods for zero-allocation structured logging:
| Event | Level | Message |
|---|---|---|
| Retry attempt | Warning | Retry attempt {N}/{Max} for {Op} after {Delay}ms |
| Timeout | Warning | Operation {Op} timed out after {Timeout}ms |
| Retry exhausted | Error | All {Max} retry attempts exhausted for {Op} |
| Circuit rejected | Warning | Circuit '{Key}' rejected request -- circuit is open |
| Circuit opened | Warning | Circuit '{Key}' opened after {N} consecutive failures |
| Bulkhead rejected | Warning | Bulkhead rejected '{Op}' -- max concurrency {N} reached |
| Fallback used | Information | Fallback used for '{Op}'. Original error: {Msg} |
Design Decisions
Section titled “Design Decisions”| Decision | Rationale |
|---|---|
| Native implementation, not a Polly wrapper | AOT-safe, zero external dependencies, full control over strategy composition |
| Only exceptions trigger resilience | Result<T, E> failures are business errors (validation, not found) — retrying them is wrong |
PassthroughPipeline for unknown policies | Zero overhead when no resilience is configured; no runtime errors for missing policies |
Strategies sorted by Order ascending | Lower order = more external wrapper. Timeout at 100 wraps everything; Retry at 400 is close to the operation |
Pluggable ICircuitBreakerStateStore | In-memory default for single-process; swap to Redis/DB for distributed circuit state |
Thread-local Random for jitter | Avoids lock contention on Random.Shared in high-throughput retry scenarios |
ResilienceContext with Properties dictionary | Cross-strategy communication without coupling strategies to each other |
Cross-Module Integration
Section titled “Cross-Module Integration”| With Module | Integration |
|---|---|
| Pragmatic.Actions | [ResiliencePolicy("name")] on DomainAction wraps execution with named pipeline |
| Pragmatic.Result | Error records (TimeoutError, etc.) integrate with Result<T, E> |
| Pragmatic.Composition | Auto-registered by SG when referenced; AddPragmaticResilience() in IStartupStep for custom config |
Samples
Section titled “Samples”See samples/Pragmatic.Resilience.Samples/ for 7 runnable scenarios: fluent builder, named policies, SG attributes, error types, retry demo (transient failure recovery with backoff), timeout demo (combined strategies), and circuit breaker demo (fail-fast after threshold).
Documentation
Section titled “Documentation”- Architecture and Concepts — The Problem, The Solution, strategy pipeline, configuration model
- Getting Started — Install and configure your first resilience policy
- Policies Reference — Detailed reference for each strategy with all options
- Common Mistakes — Frequent errors with Wrong/Right/Why format
- Troubleshooting — Problem/checklist guide for runtime issues
Requirements
Section titled “Requirements”- .NET 10.0+
Pragmatic.Result(for error types)Pragmatic.SourceGeneratoranalyzer (for[ResiliencePolicy]integration)
License
Section titled “License”Part of the Pragmatic.Design ecosystem.