Skip to content

Resilience Policies

Detailed reference for all resilience strategies in Pragmatic.Resilience. Strategies are composable and ordered by their Order value (ascending). Lower order means the strategy wraps more of the pipeline (more external).

Request
|
v
Timeout (Order 100) -----> cancels if total time exceeded
|
v
Bulkhead (Order 200) ----> rejects if max concurrency reached
|
v
CircuitBreaker (Order 300) -> rejects if circuit is open
|
v
Retry (Order 400) -------> retries on transient exception
|
v
Fallback (Order 500) ----> catches exception, returns alternative
|
v
Operation

All strategies implement IResilienceStrategy:

public interface IResilienceStrategy
{
int Order { get; }
Task<TResult> ExecuteAsync<TResult>(
Func<ResilienceContext, CancellationToken, Task<TResult>> next,
ResilienceContext context,
CancellationToken ct);
}

Retries on exceptions with configurable backoff and jitter.

OptionTypeDefaultDescription
MaxRetriesint3Maximum retry attempts (0 = no retries, max 100)
BaseDelayTimeSpan200msBase delay between retries
BackoffTypeBackoffTypeExponentialConstant, Linear, or Exponential
MaxDelayTimeSpan30sUpper bound on delay (prevents unbounded growth)
UseJitterbooltrueDecorrelated jitter to prevent thundering herd
ShouldRetryFunc<Exception, bool>?null (all)Predicate to filter which exceptions trigger retry
  • Constant: baseDelay
  • Linear: baseDelay * (attempt + 1)
  • Exponential: baseDelay * 2^attempt

When UseJitter is enabled, the computed delay is multiplied by a random factor in [0.5, 1.5) (decorrelated jitter, per the AWS recommendation). Thread-local Random avoids lock contention in high-throughput scenarios.

Throws RetryExhaustedException when all attempts are exhausted. The inner exception contains the last failure.

services.AddResiliencePolicy("transient-calls", o =>
{
o.Retry = new RetryOptions
{
MaxRetries = 5,
BaseDelay = TimeSpan.FromMilliseconds(100),
BackoffType = BackoffType.Exponential,
MaxDelay = TimeSpan.FromSeconds(10),
UseJitter = true,
ShouldRetry = ex => ex is HttpRequestException or TimeoutException
};
});

Cancels the operation if it exceeds the configured duration.

OptionTypeDefaultDescription
TimeoutTimeSpan30sMaximum allowed duration
TimeoutTypeTimeoutTypeOptimisticCancellation approach
  • Optimistic — Creates a linked CancellationToken and cancels it after the timeout. Preferred for operations that honor cancellation tokens (most async .NET APIs).
  • Pessimistic — Races Task.Delay against the operation via Task.WhenAny. For operations that do not honor cancellation (e.g., legacy synchronous code wrapped in a task). The operation may continue running in the background after timeout.

Throws TimeoutRejectedException when the timeout is exceeded.

o.Timeout = new TimeoutOptions
{
Timeout = TimeSpan.FromSeconds(5),
TimeoutType = TimeoutType.Optimistic
};

Opens after consecutive failures, rejects requests while open, allows a probe after the break duration elapses.

OptionTypeDefaultDescription
FailureThresholdint5Consecutive failures before opening
BreakDurationTimeSpan30sHow long the circuit stays open
ShouldHandleFunc<Exception, bool>?null (all)Predicate to filter which exceptions count as failures
Closed ---[threshold failures]--> Open ---[break elapsed]--> HalfOpen
^ |
| |
+----[probe succeeds]----<------<------<------<------<-------+
|
Open <----[probe fails]----<------<------<------<------<-----+
  • Closed: normal operation. Failures are counted.
  • Open: all requests rejected immediately with CircuitBrokenException.
  • HalfOpen: one probe request allowed through. If it succeeds, circuit closes. If it fails, circuit reopens.

Circuit breaker state is managed by ICircuitBreakerStateStore. The default InMemoryCircuitBreakerStateStore is thread-safe and per-process. For distributed scenarios (multiple instances sharing circuit state), implement the interface with Redis or a database backend.

Throws CircuitBrokenException when the circuit is open and a request is rejected.

o.CircuitBreaker = new CircuitBreakerOptions
{
FailureThreshold = 3,
BreakDuration = TimeSpan.FromSeconds(60),
ShouldHandle = ex => ex is not ArgumentException // Don't count argument errors
};

Limits concurrent executions using SemaphoreSlim.

OptionTypeDefaultDescription
MaxConcurrencyint10Maximum concurrent executions
MaxQueuedActionsint0Overflow queue size (0 = no queue)
QueueTimeoutTimeSpanTimeSpan.ZeroMaximum wait time in queue

When all slots are taken and the queue is full (or disabled), the request is rejected immediately with BulkheadRejectedException.

o.Bulkhead = new BulkheadOptions
{
MaxConcurrency = 5,
MaxQueuedActions = 10,
QueueTimeout = TimeSpan.FromSeconds(2)
};

Catches exceptions and provides an alternative result. This is a generic strategy (FallbackStrategy<TResult>) that only activates when the result type matches.

OptionTypeDescription
FallbackActionFunc<Exception, ResilienceContext, CancellationToken, Task<TResult>>Factory that produces the fallback value (required)
ShouldHandleFunc<Exception, bool>?Predicate to filter which exceptions trigger the fallback
OnFallbackAction<Exception, ResilienceContext>?Callback invoked when fallback is used (for logging/metrics)

Fallback is not configurable via ResiliencePolicyOptions (JSON/DI). It must be added via the fluent builder because it requires a typed factory delegate.

var pipeline = new ResiliencePipelineBuilder()
.AddRetry()
.AddStrategy(new FallbackStrategy<UserDto>(new FallbackOptions<UserDto>
{
FallbackAction = (ex, ctx, ct) => Task.FromResult(UserDto.Default),
ShouldHandle = ex => ex is HttpRequestException,
OnFallback = (ex, ctx) => logger.LogWarning("Using fallback for {Op}", ctx.OperationName)
}))
.Build();

Implement IResilienceStrategy to create custom strategies:

public class RateLimitStrategy : IResilienceStrategy
{
public int Order => 150; // Between Timeout (100) and Bulkhead (200)
public async Task<TResult> ExecuteAsync<TResult>(
Func<ResilienceContext, CancellationToken, Task<TResult>> next,
ResilienceContext context,
CancellationToken ct)
{
// Custom logic here
return await next(context, ct);
}
}
var pipeline = new ResiliencePipelineBuilder()
.AddStrategy(new RateLimitStrategy())
.AddRetry()
.Build();

Cross-strategy communication without coupling strategies to each other:

var context = new ResilienceContext
{
OperationName = "FetchUserData"
};

The OperationName is used in logging, metrics, and tracing.

A ResiliencePolicyOptions composes multiple strategies into a single pipeline:

new ResiliencePolicyOptions
{
Timeout = new() { ... }, // null = no timeout
Bulkhead = new() { ... }, // null = no bulkhead
CircuitBreaker = new() { ... }, // null = no circuit breaker
Retry = new() { ... } // null = no retry
}

Set any strategy to null to exclude it from the pipeline. Only non-null strategies are composed.

ActivitySource: "Pragmatic.Resilience". Each pipeline execution creates an activity Resilience.{policyName} with tags: policy.name, outcome, attempt.

Meter: "Pragmatic.Resilience".

InstrumentTypeName
Pipeline durationHistogrampragmatic.resilience.duration
Pipeline executionsCounterpragmatic.resilience.executions
Retry attemptsCounterpragmatic.resilience.retry_attempts
Circuit rejectionsCounterpragmatic.resilience.circuit_rejections
TimeoutsCounterpragmatic.resilience.timeouts
Bulkhead rejectionsCounterpragmatic.resilience.bulkhead_rejections

All log messages use [LoggerMessage] source-generated partial methods:

LevelMessage
WarningRetry attempt {N}/{Max} for {Op} after {Delay}ms
WarningOperation {Op} timed out after {Timeout}ms
ErrorAll {Max} retry attempts exhausted for {Op}
WarningCircuit '{Key}' rejected request -- circuit is open
WarningCircuit '{Key}' opened after {N} consecutive failures
WarningBulkhead rejected '{Op}' -- max concurrency {N} reached
InformationFallback used for '{Op}'. Original error: {Msg}