HEITS.digital - Implementing a simple retry mechanism using TDD

Test Driven Development

Implementing a simple retry mechanism using TDD

~ 15 minute read

Dragos Rogojan

TDD Enthusiast & Senior Software Engineer

19 Sep 2022

Are you interested in what the thought process should be to implement a simple exponential backoff retry mechanism in C# using TDD? Ok. Cool. Let me walk you through a practical example. Feel free to tag along at your own pace.

I needed such a retry mechanism in a Microsoft Dynamics Plugin project and, since there is no easy way to use an "off the shelf" nuget package such as Polly, without resorting to ILMerge, I got an opportunity to implement one myself. Fun fact: while writing this article, I found out that Dependent Assembly plug-ins feature is in preview.

Before jumping into the Red, Green, Refactor cycle of TDD, let's do some thinking. So Think, Red, Green, Refactor as an amazing individual I worked with used to say.

My use case was that of being able to retry the execution of an OrganizationRequest using an IOrganizationService instance. Below is an example of a method I needed to add a retry mechanism to.

1public static OrganizationResponse ExecuteRequest(this IOrganizationService service, OrganizationRequest request)
2{
3    return service.Execute(request);
4}
5

Thinking in action

What kind of signature do I need to the method I'm retrying? I can use a Func<T>. For example, I want to retry this particular Func<T>: ( ) => service.Execute(request)
How many retries should I do?
How do I provide the wait time? Maybe I want to provide different kinds of wait times per retry.
How do I wait? I can use Thread.Sleep(int milisecondsTimeout). But I want my unit tests to be fast and not really wait. Thread.Sleep looks like an Action<int>. Let's see how I can use that to my advantage.

Let me try defining how the new method that knows how to do wait and retries using some provided wait time when executing an OrganizationRequest would look like:

1public static OrganizationResponse ExecuteWithWaitAndRetry(this IOrganizationService service,
2OrganizationRequest request, int retries,
3Func<int, int> waitTimeProvider, Action<int> wait)
4{
5    return WaitAndRetry(() => service.Execute(request), retries, waitTimeProvider, wait);
6}
7

I will implement the generic T WaitAndRetry<T>(Func<T> function, int retries, Func<int, int> waitTimeProvider, Action<int> wait) method using TDD. I will use XUnit as the test framework and FluentAssertions as the assertion library. I will also use mutation testing to evaluate the quality of the resulting test suite using Stryker.

Fleshing out the first test

What should my first test look like? What should it test? I should start with a very simple scenario. What if the first execution is successful, meaning no retries needed? Let's see.

Test code

1public static OrganizationResponse ExecuteWithWaitAndRetry(this IOrganizationService service,
2OrganizationRequest request, int retries,
3Func<int, int> waitTimeProvider, Action<int> wait)
4{
5    return WaitAndRetry(() => service.Execute(request), retries, waitTimeProvider, wait);
6}
7
8public class WaitAndRetryTests
9{
10    [Fact]
11    public void Should_Return_The_Result_When_The_First_Try_Is_Successful()
12    {
13        int result = RetryMechanism.WaitAndRetry<int>(
14        () => { return 123; }, 3, (_) => _, (_) => { });
15
16        result.Should().Be(123);
17    }
18}
19

Note how I ignored the parameters waitTimeProvider and wait action for now. I'm not interested in them for the current behavior I'm testing - just the return value. Once I will need them, I will provide meaningful values to them.

Production code

1public class RetryMechanism
2{
3    public static T WaitAndRetry<T>(Func<T> function,
4    int retries, Func<int, int> waitTimeProvider, Action<int> wait)
5    {
6        return function();
7    }
8}
9

I now have a method that just executes the function received as the first parameter and returns its result.

Next test

What if the initial try throws and the first retry is successful? This puts me in an interesting situation: how do I provide to the test multiple executions of a function, each with different results? I can use a Queue<T>. And the Func<T> to be retried is actually a deque operation and an execution of the dequeud function.

Test code

1public class WaitAndRetryTests
2{
3    // ...
4    [Fact]
5    public void Should_Return_The_Result_After_Retrying_One_Time()
6    {
7        Queue<Func<int>> functionExectionsQueue = new Queue<Func<int>>();
8        functionExectionsQueue.Enqueue(
9        () => throw new Exception("Some error on the initial try"));
10        functionExectionsQueue.Enqueue(() => 123);
11
12        int result = RetryMechanism.WaitAndRetry<int>(
13        () => functionExectionsQueue.Dequeue()(), 3, (_) => _, (_) => { });
14
15        result.Should().Be(123);
16    }
17}    
18

Production code

You might rush ahead and think that I need some kind of loop. Not yet. Just wraping the function execution in a try/catch block and executing it again in the catch is enough.

1public class RetryMechanism
2{
3    public static T WaitAndRetry<T>(Func<T> function,
4    int retries, Func<int, int> waitTimeProvider, Action<int> wait)
5    {
6        try
7        {
8            return function();
9        }
10        catch
11        {
12            return function();
13        }
14    }
15}
16

I now have a method that knows how do one retry if the initial try throws.

Next test

Initial try and the first retry throws, but the second retry is successful.

Test code

1public class WaitAndRetryTests
2{
3    // ...
4    [Fact]
5    public void Should_Return_The_Result_After_Retrying_Two_Times()
6    {
7        Queue<Func<int>> functionExectionsQueue = new Queue<Func<int>>();
8        functionExectionsQueue.Enqueue(
9        () => throw new Exception("Some error on the initial try"));
10        functionExectionsQueue.Enqueue(
11        () => throw new Exception("Some error on the first retry"));
12        functionExectionsQueue.Enqueue(() => 123);
13
14        int result = RetryMechanism.WaitAndRetry<int>(
15        () => functionExectionsQueue.Dequeue()(), 3, (_) => _, (_) => { });
16
17        result.Should().Be(123);
18    }
19}
20

Production code

If you said that now it's time a loop, you are correct.

1public class RetryMechanism
2{
3    public static T WaitAndRetry<T>(Func<T> function,
4    int retries, Func<int, int> waitTimeProvider, Action<int> wait)
5    {
6        do
7        {
8            try
9            {
10                return function();
11            }
12            catch
13            {
14            }
15        } while (true);
16    }
17}
18

I now have a method that knows how do two retries if the initial try and the first retry throws.

Next test

Initial execution, the first and second retries throw, but the third retry is successful. I think you got pattern by now.

Test code

1public class WaitAndRetryTests
2{
3    // ...
4    [Fact]
5    public void Should_Return_The_Result_After_Retrying_Three_Times()
6    {
7        Queue<Func<int>> functionExectionsQueue = new Queue<Func<int>>();
8        functionExectionsQueue.Enqueue(
9        () => throw new Exception("Some error on the initial try"));
10        functionExectionsQueue.Enqueue(
11        () => throw new Exception("Some error on the first retry"));
12        functionExectionsQueue.Enqueue(
13        () => throw new Exception("Some error on the second retry"));
14        functionExectionsQueue.Enqueue(() => 123);
15
16        int result = RetryMechanism.WaitAndRetry<int>(
17        () => functionExectionsQueue.Dequeue()(), 3, (_) => _, (_) => { });
18
19        result.Should().Be(123);
20    }
21}
22

Production code

Same as the previous snippet. This test passed whitout any changes required to the production code. The production code is generic enough to handle an infinite number of retries, since I do not use the provided parameter for the number retries yet. Let's see how I can address this in the next test.

I now have a method that knows how do three retries if the initial try and the first and second retries throw.

Next test

Initial execution and all the three retries throw. I tried and retried, but I should give up and throw as well.

Test code

1public class WaitAndRetryTests
2{
3    // ...
4    [Fact]
5    public void Should_Throw_When_All_The_Retries_Are_Depleted()
6    {
7        Queue<Func<int>> functionExectionsQueue = new Queue<Func<int>>();
8        functionExectionsQueue.Enqueue(
9        () => throw new Exception("Some error on the initial try"));
10        functionExectionsQueue.Enqueue(
11        () => throw new Exception("Some error on the first retry"));
12        functionExectionsQueue.Enqueue(
13        () => throw new Exception("Some error on the second retry"));
14        functionExectionsQueue.Enqueue(
15        () => throw new Exception("Some error on the third retry"));
16
17        var executingWithRetry = () => RetryMechanism.WaitAndRetry<int>(
18        () => functionExectionsQueue.Dequeue()(), 3, (_) => _, (_) => {});
19
20        executingWithRetry.Should().Throw<Exception>().WithMessage("Some error on the third retry");
21    }
22}
23

Production code

It's now time to use the retries parameter in an exit condition.

1public class RetryMechanism
2{
3    public static T WaitAndRetry<T>(Func<T> function,
4    int retries, Func<int, int> waitTimeProvider, Action<int> wait)
5    {
6        var retry = 0;
7        do
8        {
9            try
10            {
11                return function();
12            }
13            catch
14            {
15                if (retry == retries)
16                {
17                    throw;
18                }
19                retry++;
20            }
21        } while (true);
22    }
23}
24

I now have a method that knows how do a number of retries equal to the number of the provided retries parameter value and which throws when all the retries are depleted.

Next test

I know how to do retries and throw when they are depleted, but what about waiting? Let's fix that.

Test code

1public class WaitAndRetryTests
2{
3    // ...
4    [Fact]
5    public void Should_Wait_Using_The_Provided_Wait_Time_For_Each_Retry()
6    {
7        Queue<Func<int>> functionExectionsQueue = new Queue<Func<int>>();
8        functionExectionsQueue.Enqueue(
9        () => throw new Exception("Some error on the initial try"));
10        functionExectionsQueue.Enqueue(
11        () => throw new Exception("Some error on the first retry"));
12        functionExectionsQueue.Enqueue(
13        () => throw new Exception("Some error on the second retry"));
14        functionExectionsQueue.Enqueue(() => 123);
15
16        var exponentialBackOffWaitTimeProvider = (int retry) => (int)Math.Pow(2, retry) * 1000;
17        var waitedTimeQueue = new Queue<int>();
18        var wait = (int waitTime) => { waitedTimeQueue.Enqueue(waitTime); };
19        RetryMechanism.WaitAndRetry<int>(
20        () => functionExectionsQueue.Dequeue()(), 3, exponentialBackOffWaitTimeProvider, wait);
21
22        waitedTimeQueue.Should().ContainInOrder(1000, 2000, 4000);
23    }
24}
25

Production code

Let's wait using the time provided by the waitTimeProvider.

1public class RetryMechanism
2{
3    public static T WaitAndRetry<T>(Func<T> function,
4    int retries, Func<int, int> waitTimeProvider, Action<int> wait)
5    {
6        var retry = 0;
7        do
8        {
9            try
10            {
11                return function();
12            }
13            catch
14            {
15                if (retry == retries)
16                {
17                    throw;
18                }
19                wait(waitTimeProvider(retry));
20                retry++;
21            }
22        } while (true);
23    }
24}
25

I now have a method that knows how do a number of retries equal to the number of the provided retries parameter value, which throws when all the retries are depleted and which waits between retries using the time provided by the waitTimeProvider.

That's it! Congrats if you practiced TDD along with me. You can find in this Github repo, a commit for each test and production code needed to make the test pass.

I can now use the implementation of the retry mechanism to retry an OrganizationRequest

1{
2    // get connectionString
3    CrmServiceClient crmClient = new CrmServiceClient(connectionString);
4    RetrieveEntityChangesRequest req = new RetrieveEntityChangesRequest();
5
6    // set properties on the req like Columns, EntityName, PagingInfo, DataVersion
7    var waitTimeProvider = ... // define what kind of waitTime you want to provide
8    var resp = crmClient.ExecuteWithWaitAndRetry(req, 3, waitTimeProvider,
9    System.Threading.Thread.Sleep);
10    // process response
11}
12

Stryking the mutants with Stryker

The really cool and valuable outcome when applying TDD is that you naturally get high scores for the Code Coverage and Mutation Testing metrics. You can use Stryker as the mutation tool. I ran dotnet stryker on the unit tests project and I got a 100% mutation score.