Schier Coding for fun

Writing unit tests helps verify the correctness of code. However, most unit tests only test a limited set of pre-defined input values, often just one. Testing with fixed input values is known as example-based tests.

The problem with example-based tests is that they only verify correctness for the pre-defined input values. This can easily lead to an implementation that passes the test for the pre-defined values, but fails for any other value. The implementation could thus be (largely) incorrect, even though the test passes.

Let’s demonstrate this through an example. Note that we’ll use xUnit.net as the test framework in our examples.

Single input value

Suppose we want to write a method that calculates the MD5 hash of a given string. Let’s start by writing a unit test:

[Fact]
public void MD5ReturnsCorrectHash()
{
    var input = "foo";
    var md5 = MD5(input);
    var expected = "acbd18db4cc2f85cedef654fccc4a4d8";
    Assert.Equal(expected, md5);
}

This test method use a single input value ("foo") to verify the correctness of the MD5() method. Therefore, this is an example-based test. To make this test pass, we can implement the MD5() method as follows:

public static string MD5(string input)
{
    return "acbd18db4cc2f85cedef654fccc4a4d8";
}

Clearly, this implementation does not correctly implement the MD5 algorithm, but it passes our test! The danger of using one input value to verify correctness is that you can make the test pass by just hard-coding the expected result.

Note that in TDD, you actually should write the minimal amount of code to make the test pass, so the above implementation is perfectly reasonable when doing TDD.

In the next section, we’ll see how to strengthen our test by using multiple input values.

Multiple input values

The obvious solution to strengten tests that use one input value, is to use several input values. One way to do this is to create copies of the existing test method, but with different input values:

[Fact]
public void MD5ForFooInputReturnsCorrectHash()
{
    var md5 = MD5("foo");
    Assert.Equal("acbd18db4cc2f85cedef654fccc4a4d8", md5);
}

[Fact]
public void MD5ForBarInputReturnsCorrectHash()
{
    var md5 = MD5("bar");
    Assert.Equal("37b51d194a7513e45b56f6524f2d51f2", md5);
}

[Fact]
public void MD5ForBazInputReturnsCorrectHash()
{
    var md5 = MD5("baz");
    Assert.Equal("73feffa4b7f6bb68e44cf984c85f6e88", md5);
}

Although there is nothing wrong with these three tests, we have some code duplication. Luckily, xUnit has the concept of parameterized tests, which allows us to define a single test with more than one input value.

Here is the parameterized test equivalent of our previous three tests:

[Theory]
[InlineData("foo", "acbd18db4cc2f85cedef654fccc4a4d8")]
[InlineData("bar", "37b51d194a7513e45b56f6524f2d51f2")]
[InlineData("baz", "73feffa4b7f6bb68e44cf984c85f6e88")]
public void MD5ReturnsCorrectHash(string input, string expected)
{
    var md5 = MD5(input);
    Assert.Equal(expected, md5);
}

This test differs from our previous tests in several ways:

  • The [Fact] attribute is replaced with the [Theory] attribute. This marks the test as a parameterized test.
  • The test method has two parameters: input and expected, which replace the hard-coded values in our test.
  • Three [InlineData] attributes have been added, one for each input value/expected hash combination.

When xUnit runs this test, it will actually run it three times, with the [InlineData] attributes’ parameters used as the input and expected parameters. Therefore, running our parameterized test results in our test being called three times with the following arguments:

MD5ReturnsCorrectHash("foo", "acbd18db4cc2f85cedef654fccc4a4d8");
MD5ReturnsCorrectHash("bar", "37b51d194a7513e45b56f6524f2d51f2");
MD5ReturnsCorrectHash("baz", "73feffa4b7f6bb68e44cf984c85f6e88");

If we run our parameterized test, it will fail for the "bar" and "baz" input values. To make our test pass, we could again hard-code the expected values:

public static string MD5(string input)
{
    if (input == "foo")
    {
        return "acbd18db4cc2f85cedef654fccc4a4d8";
    }
    if (input == "bar")
    {
        return "37b51d194a7513e45b56f6524f2d51f2";
    }
    if (input == "baz")
    {
        return "73feffa4b7f6bb68e44cf984c85f6e88";
    }
    
    return input;
}

With this modified implementation, the test passes for all three input values. Unfortunately, having multiple tests still allowed us to easily hard-code the implementation; it did not prevent us from incorrectly implementing the MD5 algorithm.

So although having multiple input values leads to stronger tests, it still remains a limited set of input values. Wouldn’t it be great if we could run our tests using all possible input values? Enter property-based testing.

Property-based testing

In property-based testing, we take a different approach to writing our tests. Instead of testing for specific input and output values, we test the properties of the code we are testing. You can think of properties as rules, invariants or requirements. For example, these are some essential properties of the MD5 algorithm:

  1. The hash is 32 characters long.
  2. The hash only contains hexadecimal characters.
  3. Equal inputs have the same hash.
  4. The hash is different from its input.
  5. Similar inputs have significantly different hashes.

So how do we write tests for these properties? First, notice that these five properties are generic: they must be true for all possible input values. Unfortunately, writing tests that actually check all input values is not feasible, as running them would take ages. But if we can’t test all input values, how can we write tests for our properties? Well, we use the next best thing: random input values.

If you test using random input values, hard-coding the expected values is no longer possible as you don’t know beforehand which input values will be tested. This forces you to write a generic implementation that works for any (valid) input value.

Let’s see how that works by writing property-based tests for all five properties we just defined.

Property 1: hash is 32 characters long

Our first property states that an MD5 hash is a string of length 32. If we would write this as a regular, example-based test, it would look like this:

[Fact]
public void MD5ReturnsStringOfCorrectLength()
{
    var input = "foo";
    var hash = MD5(input);
    Assert.Equal(32, hash.Length);
}

We could strengthen our test by converting it to a parameterized test with several input values:

[Theory]
[InlineData("foo")]
[InlineData("bar")]
[InlineData("baz")]
public void MD5ReturnsStringOfCorrectLength(string input)
{
    var hash = MD5(input);
    Assert.Equal(32, hash.Length);
}

However, as said, property-based tests should work with any input value. A property-based test is thus a parameterized test, but with the pre-defined input values replaced by random input values. Our initial attempt at writing a property-based test might look like this:

[Theory]
public void MD5ReturnsStringOfCorrectLength(string input)
{
    var hash = MD5(input);
    Assert.Equal(32, hash.Length);
}

Unfortunately, if we run this test, we’ll find that xUnit reports an error for parameterized tests without input.

To define a test that works with randomly generated input, we’ll use the FsCheck property-based testing framework. We’ll also install the FsCheck.Xunit package, for easy integration of FsCheck with xUnit.

Having installed these libraries, we can convert our test to a property-based test by decorating it with the [Property] attribute:

[Property]
public void MD5ReturnsStringOfCorrectLength(string input)
{
    var hash = MD5(input);
    Assert.Equal(32, hash.Length);
}

Note that we don’t explicitly specify the input value(s) to use, FsCheck will (randomly) generate those. Let’s run this test to see what happens:

MD5ReturnsStringOfCorrectLength [FAIL]

    FsCheck.Xunit.PropertyFailedException : 
    Falsifiable, after 1 test (0 shrinks) (StdGen (766423555,296119444)):
    Original:
    ""

    ---- Assert.Equal() Failure
    Expected: 32
    Actual:   0

The test report indicates that our property-based test failed after one test, for the randomly generated empty string ("") input value. To make our empty string pass the test, we’ll pad unknown inputs to a length of 32:

public static string MD5(string input)
{
    if (input == "foo")
    {
        return "acbd18db4cc2f85cedef654fccc4a4d8";
    }
    if (input == "bar")
    {
        return "37b51d194a7513e45b56f6524f2d51f2";
    }
    if (input == "baz")
    {
        return "73feffa4b7f6bb68e44cf984c85f6e88";
    }
    
    return input.PadRight(32);
}

This should fix the empty string problem, so let’s run the test again:

MD5ReturnsStringOfCorrectLength [FAIL]

    FsCheck.Xunit.PropertyFailedException : 
    Falsifiable, after 12 tests (0 shrinks) (StdGen (1087984417,296119448)):
    Original:
    <null>
        
    ---- System.NullReferenceException : Object reference not set to an instance of an object

Hmmm, our test still fails. This time though, the first 11 tests passed, so we made some progress. The test now failed when FsCheck generated the null input value. Let’s fix this:

public static string MD5(string input)
{
    if (input == "foo")
    {
        return "acbd18db4cc2f85cedef654fccc4a4d8";
    }
    if (input == "bar")
    {
        return "37b51d194a7513e45b56f6524f2d51f2";
    }
    if (input == "baz")
    {
        return "73feffa4b7f6bb68e44cf984c85f6e88";
    }
    
    return (input ?? string.Empty).PadRight(32);
}

And we run our test again:

MD5ReturnsStringOfCorrectLength [FAIL]

    FsCheck.Xunit.PropertyFailedException : 
    Falsifiable, after 43 tests (34 shrinks) (StdGen (964736467,296119643)):
    Original:
    "#e3n+[TC9[Jlbs,x=3U!f\~J'i u+)-y>4VLg]uA("
                
    Shrunk:
    "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

    ---- Assert.Equal() Failure
    Expected: 32
    Actual:   33

Again, more progress, as it now took 43 tests before the test failed for the input value "#e3n+[TC9[Jlbs,x=3U!f\~J'i u+)-y>4VLg]uA(". There is something different about the test report this time though, as it mentions “34 shrinks” and a “shrunk” value, what is that about?

Shrinking

In property-based testing, shrinking is used to find a minimal counter-example that proves the property does not hold for all possible input values. This minimal counter example is listed as the “shrunk” in our test report. Before we examine how shrinking works, try to figure out for yourself what’s so special about the “shrunk” "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" input value mentioned in the test report.

If you guessed that the specified “shrunk” input was special for its length, you’d be correct! The "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" string contains 33 characters, which significance becomes apparent if we look at the code in our implementation that processes this input value:

return (input ?? string.Empty).PadRight(32);

If we use "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" as our input, the PadRight(32) call doesn’t do anything as our string is already greater than or equal to 32. Therefore, the input value is returned unmodified, which means that the string that is returned also has length 33, which fails our test.

The interesting thing about an input value of length 33 is that 33 is the minimum length for which the test fails; strings of length 32 or less all pass the test. As such, the listed “shrunk” value of length 33 is a minimal counter-example to our property.

The main benefit of having a minimal counter-example is that is helps you locate precisely where your implementation breaks down (in our case for strings of length 33 and greater). You thus spend less time debugging and more writing code.

Finding the “shrunk” value

So how does FsCheck find this shrunk value? To find out, we’ll have FsCheck output the values it generates. Doing that is easy, we just set the Verbose property of our [Property] attribute to true:

[Property(Verbose = true)]
public void MD5ReturnsStringOfCorrectLength(string input)
{
    var hash = MD5(input);
    Assert.Equal(32, hash.Length);
}

Now if we run our test, we’ll see exactly which values FsCheck generated as test input:

MD5ReturnsStringOfCorrectLength [FAIL]

    FsCheck.Xunit.PropertyFailedException : 
    Falsifiable, after 50 tests (38 shrinks) (StdGen (1153044621,296119646)):
    Original:
    "XqyW\O!Lr%ce3]4=H~=6lG, 5lT\aDz%n9"
    Shrunk:
    "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

    ---- Assert.Equal() Failure
    Expected: 32
    Actual:   33

    Output:
    0: "5z"
    1: "Y"
    2: "r"
    3: ""
    4: "t9[Q"
    ...
    45: <null>
    46: "Qlbz?|perK"
    47: "XP3vO$-`l"
    48: Y{q6oevZA7"0R
    49: "XqyW\O!Lr%ce3]4=H~=6lG, 5lT\aDz%n9"

    shrink: "qyW\O!Lr%ce3]4=H~=6lG, 5lT\aDz%n9"
    shrink: "yW\O!Lr%ce3]4=H~=6lG, 5lT\aDz%n9"
    shrink: "yW\O!Lr%ce3]4=H~=6lG,5lT\aDz%n9"
    shrink: "W\O!Lr%ce3]4=H~=6lG,5lT\aDz%n9"
    shrink: "\O!Lr%ce3]4=H~=6lG,5lT\aDz%n9"
    shrink: "O!Lr%ce3]4=H~=6lG,5lT\aDz%n9"
    shrink: "O!Lr%ce3]4=H~=6lG,5lT\aDz%n9a"
    shrink: "O!Lr%ce3]4=H~=6lG,5lT\aDz%naa"
    shrink: "O!Lr%ce3]4=H~=6lG,5lT\aDz%aaa"
    shrink: "O!Lr%ce3]4=H~=6lG,5lT\aDzaaaa"
    shrink: "O!Lr%ce3]4=H~=6lG,5lT\aDaaaaa"
    shrink: "O!Lr%ce3]4=H~=6lG,5lT\aaaaaaa"
    shrink: "O!Lr%ce3]4=H~=6lG,5lTaaaaaaaa"
    shrink: "O!Lr%ce3]4=H~=6lG,5lTaaaaaaaaa"
    shrink: "O!Lr%ce3]4=H~=6lG,5laaaaaaaaaa"
    shrink: "O!Lr%ce3]4=H~=6lG,5laaaaaaaaaaa"
    shrink: "O!Lr%ce3]4=H~=6lG,5aaaaaaaaaaaa"
    shrink: "O!Lr%ce3]4=H~=6lG,aaaaaaaaaaaaa"
    shrink: "O!Lr%ce3]4=H~=6lG,aaaaaaaaaaaaaa"
    shrink: "O!Lr%ce3]4=H~=6lGaaaaaaaaaaaaaaa"
    shrink: "O!Lr%ce3]4=H~=6laaaaaaaaaaaaaaaa"
    shrink: "O!Lr%ce3]4=H~=6aaaaaaaaaaaaaaaaa"
    shrink: "O!Lr%ce3]4=H~=aaaaaaaaaaaaaaaaaa"
    shrink: "O!Lr%ce3]4=H~aaaaaaaaaaaaaaaaaaa"
    shrink: "O!Lr%ce3]4=Haaaaaaaaaaaaaaaaaaaa"
    shrink: "O!Lr%ce3]4=aaaaaaaaaaaaaaaaaaaaa"
    shrink: "O!Lr%ce3]4aaaaaaaaaaaaaaaaaaaaaa"
    shrink: "O!Lr%ce3]aaaaaaaaaaaaaaaaaaaaaaa"
    shrink: "O!Lr%ce3aaaaaaaaaaaaaaaaaaaaaaaa"
    shrink: "O!Lr%ceaaaaaaaaaaaaaaaaaaaaaaaaa"
    shrink: "O!Lr%caaaaaaaaaaaaaaaaaaaaaaaaaa"
    shrink: "O!Lr%caaaaaaaaaaaaaaaaaaaaaaaaaaa"
    shrink: "O!Lr%aaaaaaaaaaaaaaaaaaaaaaaaaaaa"
    shrink: "O!Lraaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
    shrink: "O!Laaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
    shrink: "O!aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
    shrink: "Oaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
    shrink: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

The test report starts with the 50 randomly generated input values (note: we omitted most for brevity). The last input value ("XqyW\O!Lr%ce3]4=H~=6lG, 5lT\aDz%n9"), listed as input #49, is the first for which the test failed. Having found an input value that fails the test, FsCheck then starts the shrinking process.

To see how shrinking works, let’s list the failing input value and the first six subsequent shrinks:

# Input Length Passes test
0 “XqyW\O!Lr%ce3]4=H~=6lG, 5lT\aDz%n9” 33 false
1 “qyW\O!Lr%ce3]4=H~=6lG, 5lT\aDz%n9” 32 true
2 “yW\O!Lr%ce3]4=H~=6lG, 5lT\aDz%n9” 31 true
3 “yW\O!Lr%ce3]4=H~=6lG,5lT\aDz%n9” 30 true
4 “W\O!Lr%ce3]4=H~=6lG,5lT\aDz%n9” 29 true
5 “\O!Lr%ce3]4=H~=6lG,5lT\aDz%n9” 28 true
6 “O!Lr%ce3]4=H~=6lG,5lT\aDz%n9” 27 true

The pattern here is quite obvious: with each shrinking step, the previous value is stripped of its first character, decreasing the length by one. Note that all strings of length 32 or less pass the test. FsCheck will use this fact later on.

The next six shrinks follow a different pattern:

# Input Length Passes test
7 “O!Lr%ce3]4=H~=6lG,5lT\aDz%n9a” 28 true
8 “O!Lr%ce3]4=H~=6lG,5lT\aDz%naa” 28 true
9 “O!Lr%ce3]4=H~=6lG,5lT\aDz%aaa” 28 true
10 “O!Lr%ce3]4=H~=6lG,5lT\aDzaaaa” 28 true
11 “O!Lr%ce3]4=H~=6lG,5lT\aDaaaaa” 28 true
12 “O!Lr%ce3]4=H~=6lG,5lT\aaaaaaa” 28 true

This time, each shrink step replaces one character with the 'a' character. FsCheck uses this shrink strategy to check if replacing specific characters in the input string can make the test fail.

As changing characters in the input values did not make the test fail, FsCheck then uses the fact that the only input value that failed the test had a length of 33. It therefore modifies its shrinking strategy and starts generating longer input values, which will lead to generating a 33 length input. Besides generating longer input values, it will additionaly also use the one character modification shrinking strategy as an extra check. This leads to the following sequence of shrinks:

# Input Length Passes test
13 “O!Lr%ce3]4=H~=6lG,5lTaaaaaaaa” 29 true
14 “O!Lr%ce3]4=H~=6lG,5lTaaaaaaaaa” 30 true
15 “O!Lr%ce3]4=H~=6lG,5laaaaaaaaaa” 30 true
16 “O!Lr%ce3]4=H~=6lG,5laaaaaaaaaaa” 31 true
17 “O!Lr%ce3]4=H~=6lG,5aaaaaaaaaaaa” 31 true
18 “O!Lr%ce3]4=H~=6lG,aaaaaaaaaaaaa” 31 true
19 “O!Lr%ce3]4=H~=6lG,aaaaaaaaaaaaaa” 32 true
20 “O!Lr%ce3]4=H~=6lGaaaaaaaaaaaaaaa” 32 true

The 11 shrinks that follow these shrinks are all 32 character strings with one character replaced, which all pass the test. Things start getting interesting from shrink #32, which is a string of length 33 and thus fails the test:

# Input Length Passes test
32 “O!Lr%caaaaaaaaaaaaaaaaaaaaaaaaaaa” 33 false

At this point, FsCheck rightly infers that strings of length 33 form the minimal set of counter-examples to our property. The final shrinking steps again use the single character replacement shrinking strategy, to see if changing a single character in an input value of length 33 can make the test pass:

# Input Length Passes test
33 “O!Lr%aaaaaaaaaaaaaaaaaaaaaaaaaaaa” 33 false
34 “O!Lraaaaaaaaaaaaaaaaaaaaaaaaaaaaa” 33 false
35 “O!Laaaaaaaaaaaaaaaaaaaaaaaaaaaaaa” 33 false
36 “O!aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa” 33 false
37 “Oaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa” 33 false
38 “aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa” 33 false

Even with all characters changed, the input still fails the test. At this point, FsCheck considers the input value "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" to be the minimal counter-example (or “shrunk”) to our property-based test.

Now that we know our test fails for strings of length 33 (and greater), we can fix our implementation as follows:

public static string MD5(string input)
{
    if (input == "foo")
    {
        return "acbd18db4cc2f85cedef654fccc4a4d8";
    }
    if (input == "bar")
    {
        return "37b51d194a7513e45b56f6524f2d51f2";
    }
    if (input == "baz")
    {
        return "73feffa4b7f6bb68e44cf984c85f6e88";
    }
    
    return (input ?? string.Empty).PadRight(32).Substring(0, 32);
}

Now, our test passes all generated inputs, and we have our first, working, property-based test!

Clearly, this single property test did not force us to write a correct implementation, but that is perfectly normal. With property-based testing, you usually need to write several property-based tests before you are finally forced to write a correct implementation, so let’s move on to property two.

Property 2: hash contains only hexadecimal characters

Our second property states that the MD5 hash consists of only hexadecimal characters:

[Property]
public void MD5ReturnsStringWithOnlyAlphaNumericCharacters(string input)
{
    var hash = MD5(input);
    var allowed = "0123456789abcdef".ToCharArray();
    Assert.All(hash, c => Assert.Contains(c, allowed));
}

This test should fail for any input other than the "foo", "bar" or "baz" strings, which indeed it does:

MD5ReturnsStringWithOnlyAlphaNumericCharacters [FAIL]

    FsCheck.Xunit.PropertyFailedException : 
    Falsifiable, after 1 test (0 shrinks) (StdGen (961984695,296121694)):
    Original:
    ""

    ---- Assert.All() Failure: 32 out of 32 items in the collection did not pass.
    [31]: Xunit.Sdk.ContainsException: Assert.Contains() Failure
        Not found: ' '
        In value:  Char[] ['0', '1', '2', '3', '4', ...]

Let’s fix this:

public static string MD5(string input)
{
    if (input == "foo")
    {
        return "acbd18db4cc2f85cedef654fccc4a4d8";
    }
    if (input == "bar")
    {
        return "37b51d194a7513e45b56f6524f2d51f2";
    }
    if (input == "baz")
    {
        return "73feffa4b7f6bb68e44cf984c85f6e88";
    }
    
    return "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
}

This implementation not only passes this test, but also all other tests. On to property three.

Property 3: same hash for same input

This test verifies that when presented with the same input, the same hash is returned:

[Property]
public void MD5ReturnsSameHashForSameInput(string input)
{
    var hash1 = MD5(input);
    var hash2 = MD5(input);
    Assert.Equal(hash1, hash2);
}

Simple enough, and the current implementation passes this test.

Property 4: hash is different from input

Our next property verifies that the hash is different from the input value (which is an essential property of any hashing algorithm):

[Property]
public void MD5ReturnsStringDifferentFromInput(string input)
{
    var hash = MD5(input);
    Assert.NotEqual(input, hash);
}

At the moment, this test will pass for every input string except "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa". That means that unless FsCheck randomly generates the "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" input value, the property-based test will pass and we would think our implementation was correct. Let’s demonstrate this by adding an example-based test:

[Fact]
public void MD5ReturnsStringDifferentFromManyAsInput()
{
    var input = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
    var hash = MD5(input);
    Assert.NotEqual(input, hash);
}

If we run both the property-based and the example-based tests, only the example-based test fails:

MD5ReturnsStringDifferentFromManyAsInput [FAIL]

    Assert.NotEqual() Failure
    Expected: Not "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
    Actual:   "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

This is one of the drawbacks of testing with random data: you can have tests that sometimes fail. In such cases, augmenting a property-based test with an example-based test can be quite useful.

The fix is simple of course:

public static string MD5(string input)
{
    if (input == "foo")
    {
        return "acbd18db4cc2f85cedef654fccc4a4d8";
    }
    if (input == "bar")
    {
        return "37b51d194a7513e45b56f6524f2d51f2";
    }
    if (input == "baz")
    {
        return "73feffa4b7f6bb68e44cf984c85f6e88";
    }
    if (input == "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")
    {
        return "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb";
    }
    
    return "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
}

Let’s move on to our last property.

Property 5: similar inputs have non-similar hashes

Our last property states a very important property of MD5 hashes: similar inputs should not return similar hashes. For example, the hash for the string "hello" should be completely different from the hash for the string "hello1".

To write a test for this property, we have to define how we calculate the “difference” between two strings. In this case, we’ll just use a simple algorithm that counts the number of positions where the strings have different characters. Obviously, this simple algorithm would normally not suffice, but it works for our purposes. Note that for previty we’ll omit the actual implementation.

The property test looks like this:

[Property]
public void MD5WithSimilarInputsInputsDoesNotReturnSimilarHashes(string input, char addition)
{
    var similar = input + addition;
    var hash1 = MD5(input);
    var hash2 = MD5(similar);
    var difference = Difference(hash1, hash2);
    Assert.InRange(difference, 5, 32);
}

In this test, we let FsCheck generate two input values: the input value and a character to append to the input value. We then calculate the hashes for the original and modified input, calculate the difference and verify that the difference is at least 5 characters (which is was chosen arbitrarily).

In we run our test again, it fails:

MD5WithSimilarInputsInputsDoesNotReturnSimilarHashes [FAIL]

    FsCheck.Xunit.PropertyFailedException : 
    Falsifiable, after 1 test (2 shrinks) (StdGen (1928700473,296121700)):
    Original:
    ("D", 'F')
    Shrunk:
    ("", 'a')

    ---- Assert.InRange() Failure
    Range:  (5 - 32)
    Actual: 0

Modifying our implementation is still possible, but it becomes increasingly harder. At this point, we’ll give in and correctly implement the MD5 algorithm:

public static string MD5(string input)
{
    using (var md5 = System.Security.Cryptography.MD5.Create()) 
    {
        var inputBytes = System.Text.Encoding.ASCII.GetBytes(input);
        var hash = md5.ComputeHash(inputBytes);

        var sb = new StringBuilder();

        for (var i = 0; i < hash.Length; i++)
        {
            sb.Append(hash[i].ToString("x2"));
        }

        return sb.ToString();    
    }
}

Let’s run all our test to verify they all pass:

MD5WithSimilarInputsInputsDoesNotReturnSimilarHashes [FAIL]
      FsCheck.Xunit.PropertyFailedException : 
      Falsifiable, after 13 tests (1 shrink) (StdGen (1520259769,296121702)):
      Original:
      (null, 'm')
      Shrunk:
      (null, 'a')
      
    ---- System.ArgumentNullException : String reference not set to an instance of a String.

MD5ReturnsStringWithOnlyAlphaNumericCharacters [FAIL]
      FsCheck.Xunit.PropertyFailedException : 
      Falsifiable, after 8 tests (0 shrinks) (StdGen (1521993499,296121702)):
      Original:
      <null>
      
    ---- System.ArgumentNullException : String reference not set to an instance of a String.

MD5ReturnsStringDifferentFromInput [FAIL]
      FsCheck.Xunit.PropertyFailedException : 
      Falsifiable, after 4 tests (0 shrinks) (StdGen (1522075879,296121702)):
      Original:
      <null>
      
    ---- System.ArgumentNullException : String reference not set to an instance of a String.

MD5ReturnsSameHashForSameInput [FAIL]
    FsCheck.Xunit.PropertyFailedException : 
    Falsifiable, after 2 tests (0 shrinks) (StdGen (1522087989,296121702)):
    Original:
    <null>
      
    ---- System.ArgumentNullException : String reference not set to an instance of a String.

MD5ReturnsStringOfCorrectLength [FAIL]
    FsCheck.Xunit.PropertyFailedException : 
    Falsifiable, after 28 tests (0 shrinks) (StdGen (1522097819,296121702)):
    Original:
    <null>
      
    ---- System.ArgumentNullException : String reference not set to an instance of a String.

Oh no, by correctly implementing the MD5 method, we made all our property-based tests fail! What happened? Well, previously, we correctly handled null input values in our implementation, but we don’t anymore. If we think about it, null can be consider an invalid input value and throwing an ArgumentNullException is thus perfectly reasonable. We could write an example-based test to verify this behavior:

[Fact]
public void MD5WithNullInputThrowsArgumentNullException()
{
    Assert.Throws<ArgumentNullException>(() => MD5(null));
}

To fix our failing property-based tests, we should rephrase our properties to state that they only hold for valid (non-null) input values. Our next step is to make FsCheck only generate valid, non-null input values.

Customizing input value generation

FsCheck generates values through a concept known as arbitraries. To fix our failing property tests, we’ll define a function that returns a Arbitrary<string>, which FsCheck will use to generate strings with. Our custom Arbitrary<string> instance can generate any string, expect the null string.

To define our custom arbitrary, we create a class with a static method that returns an Arbitrary<string> instance:

public static class NonNullStringArbitrary
{
    public static Arbitrary<string> Strings()
    {
        return Arb.Default.String().Filter(x => x != null);
    }
} 

The Strings() method returns a modified Arbitrary<string> instance that filters out the null string. Note that the name of the method is arbitrary (pun intended), FsCheck will ignore the name and only look at the return type.

We can then use our custom arbitrary in our tests through the Arbitrary property of the [Property] attribute:

[Property(Arbitrary = new[] { typeof(NonNullStringArbitrary) })]
public void MD5ReturnsStringOfCorrectLength(string input)
{
    var hash = MD5(input);
    Assert.Equal(32, hash.Length);
}

If we now run our test, our custom arbitrary is used, which means the null input will not be generated and all our implementation passes all property-based tests!

To make using our custom arbitrary easier, we can create an attribute that derives from PropertyAttribute that automatically sets the Arbitrary property to our custom arbitrary:

public class MD5PropertyAttribute : PropertyAttribute
{
    public MD5PropertyAttribute()
    {
        Arbitrary = new[] { typeof(NonNullStringArbitrary) };
    }
}

We can now use this attribute instead of using a [Property] attribute:

[MD5Property]
public void MD5ReturnsStringOfCorrectLength(string input)
{
    var hash = MD5(input);
    Assert.Equal(32, hash.Length);
}

Much better. The final step is to use this custom attribute on all our property-based tests. To verify we didn’t break anything, we run all tests one more time and thankfully, they all still pass!

Regular tests or property tests?

Now that we know how to write property-based tests, should we stop writing example-based tests? Well, probably not.

One problem with property-based tests is that it can be hard to identify a set of properties that completely cover the desired functionality. In those cases, it can be very useful to also define some example-based tests. This is what we did in our example. The five properties for which we wrote tests did not completely describe the MD5 algorithm though, which is why we also needed an additional parameterized example-based test to ensure the correctness of our implementation.

Furthermore, as example-based tests are less abstract, they are usually easier to understand. Defining some example-based tests in addition to your property-based tests, can thus help explain what you are testing.

Practicing

Propery-based testing is not easy to start with. Especially in the beginning, you’ll probably struggle to identify properties. A good way to practice your property-based testing skills, is to solve the Diamond kata using property-based testing.

If you’d like to learn more about FsCheck and property-based testing, check out the following links:

FsCheck is the most popular property-based framework for the .NET platform, but other platforms have their own implementation. The most well-known are QuickCheck (Haskell), and ScalaCheck (JVM), which both inspired FsCheck.

Conclusion

Although writing tests is important, the way you write your tests is equally important. To strengthen example-based tests, you should use as many inputs as possible. While multiple inputs are better than one input, property-based testing is even better as it works with any input value. It achieves this by randomly generating a large number of input values.

The random nature of property-based testing forces you to approach your tests differently. You don’t focus on specific use cases, but on the general properties of the code you want to test. This helps you better understand the requirements of the code you want to test. Testing with random inputs also makes hard-coding an implementation using pre-defined input values extremely hard. Furthermore, the clever way in which property-based testing frameworks can provide minimal counter-examples, really helps identifying and fixing issues in your implementation.

The main problem with property-based testing is that it can be hard to get started with. In particular, figuring out what properties to test can be quite hard. Furthermore, property-based tests can fail to completely describe all aspects of the implementation. In those cases, you should augment property-based tests with regular example-based tests.

 

In C# 6, the nameof operator allows you to retrieve the name of a variable, type or member.

Example

The following example shows how you can use the nameof operator to retrieve the name of a namespace, class, method, parameter, property, field or variable:

using System;
                    
public class Program
{
    private static DateTime Today = DateTime.Now;

    public string Name { get; set; }
    
    public static void Main(string[] args)
    {
        var localTime = DateTime.Now.ToLocalTime();
        var åçéñøûß = true;
        
        Console.WriteLine(nameof(localTime));     // "localTime"
        Console.WriteLine(nameof(åçéñøûß));       // "åçéñøûß"
        Console.WriteLine(nameof(args));          // "args"
        Console.WriteLine(nameof(System.IO));     // "IO"
        Console.WriteLine(nameof(Main));          // "Main"
        Console.WriteLine(nameof(Program));       // "Program"
        Console.WriteLine(nameof(Program.Today)); // "Today"
        Console.WriteLine(nameof(Program.Name));  // "Name"
    }
}

Restrictions

Although the nameof operator works with most language constructs, there are some restrictions. For example, you cannot use the nameof operator on open generic types or method return values:

using System;
using System.Collections.Generic;
                    
public class Program
{
    public static int Main()
    {   
        Console.WriteLine(nameof(List<>)); // Compile-time error
        Console.WriteLine(nameof(Main())); // Compile-time error
        
        return 0;
    }
}

Furthermore, if you apply it to a generic type, the generic type parameter will be ignored:

using System;
using System.Collections.Generic;
                    
public class Program
{
    public static void Main()
    {   
        Console.WriteLine(nameof(List<int>));  // "List"
        Console.WriteLine(nameof(List<bool>)); // "List"
    }
}

When to use?

So when should you use the nameof operator? The prime example are exceptions that take a parameter name as an argument. Consider the following code:

public static void DoSomething(string val)
{
    if (val == null) 
    {
        throw new ArgumentNullException("val"); 
    }   
}

If the val parameter is null, an ArgumentNullException is thrown with the parameter name ("val") as a string argument. However, if we do a rename refactoring of the val parameter, the "val" string will not be modified. This results in the wrong parameter name being passed to the ArgumentNullException:

public static void DoSomething(string input)
{
    if (input == null) 
    {
        throw new ArgumentNullException("val"); 
    }   
}

Instead of hard-coding the parameter name as a string, we should use the nameof operator :

public static void DoSomething(string val)
{
    if (val == null) 
    {
        throw new ArgumentNullException(nameof(val));   
    }   
}

As nameof(val) returns the string "val", the functionality is unchanged. However, as the nameof operator references the val parameter, a rename refactoring of that parameter also changes the nameof operator’s argument:

public static void DoSomething(string input)
{
    if (input == null) 
    {
        throw new ArgumentNullException(nameof(input)); 
    }   
}

This time, the refactoring did not break anything - the correct parameter name is still passed to the ArgumentNullException constructor.

Now we know how to use the nameof operator, let’s find out how it is implemented.

Under the hood

One way to find out how a C# feature is implemented, is by looking at the Common Intermediate Language (CIL) code the compiler generates. As a refresher, the following diagram describes how C# code is compiled:

Compilation of .NET code

There are basically two steps:

  1. Compiling C# code to CIL code. This is done by the C# compiler and happens at compile time.
  2. Compiling CIL code to native code. This is done by the CLR and happens at runtime.

As C# code is compiled to CIL code, examining the generated CIL code gives us insight in how C# features are implemented by the compiler. As an example, C# properties are compiled to plain getter and setter methods at the CIL level.

Let’s examine the CIL code that is generated for the aforementioned examples.

CIL code for string-based example

To view the CIL code generated for our string-based example, we first have to compile it. Just as a reminder, this is the C# code for our string-based example:

public static void DoSomething(string val)
{
    if (val == null) 
    {
        throw new ArgumentNullException("val"); 
    }   
}

To examine the CIL code the compiler generates, we compile our code. As this code is part of a console application, the C# compiler writes the CIL code to an executable.

To view the CIL code in the compiled executable, we can use a disassembler. We’ll use ildasm, which is a disassembler that comes pre-installed with Visual Studio. ildasm can be used both as a command-line and GUI tool, but we’ll use the command-line functionality.

To use ildasm on our executable, we open a Visual Studio Command Prompt. Then, we navigate to the directory that contains the executable we just compiled. Finally, we call ildasm with our executable’s name as its argument: ildasm nameof.exe /text.

This command will output all CIL code in the nameof.exe executable to the console (note: omit the /text modifier to open the ildasm GUI). Amongst the CIL code written to the console is the CIL code for our DoSomething method:

.method public hidebysig static void  DoSomething(string val) cil managed
{
  // Code size       22 (0x16)
  .maxstack  2
  .locals init ([0] bool V_0)
  IL_0000:  nop
  IL_0001:  ldarg.0
  IL_0002:  ldnull
  IL_0003:  ceq
  IL_0005:  stloc.0
  IL_0006:  ldloc.0
  IL_0007:  brfalse.s  IL_0015
  IL_0009:  nop
  IL_000a:  ldstr      "val"
  IL_000f:  newobj     instance void [mscorlib]System.ArgumentNullException::.ctor(string)
  IL_0014:  throw
  IL_0015:  ret
} // end of method Program::DoSomething

These instructions are the CIL representation of our C# code. If you are not familiar with CIL code, this may be a bit daunting. However, you don’t have to know all CIL instructions to understand what is happening. Let’s walk through each instruction to see what is happening.

  • .maxstack: set the maximum number of items on the stack to 2.
  • .locals: create a local variable of type bool at index 0.
  • IL_0000: do nothing. This instruction allows a breakpoint to be placed on a non-executable piece of code; in this case, the method’s opening curly brace.
  • IL_0001: push the val parameter value (the argument at index 0) on the stack.
  • IL_0002: push the null value on the stack.
  • IL_0003: pop the top two stack parameters from the stack, compare them for equality, and push the comparison result on the stack.
  • IL_0005: pop the top stack value (the equality comparison result) and store it in the local variable with index 0.
  • IL_0006: push the local variable at index 0 (the equality comparison result) on the stack.
  • IL_0007: pop top stack value (the equality comparison result) from stack. If this value is equal to 0 (false), jump to statement IL_0015 (end of method). If not, do nothing.
  • IL_0009: do nothing (allows breakpoint at opening curly brace inside if statement).
  • IL_000a: push the string "val" on the stack.
  • IL_000f: pop the top stack value (the "val" string) from the stack, and push a new ArgumentNullException instance on the stack by calling its constructor that takes the popped "val" string as its argument.
  • IL_0014: pop the ArgumentNullException from the stack and throw it.
  • IL_0015: return from the method.

While there is certainly more CIL code than there was C# code, it is not that hard to grasp what is happening if you are familiar with stacks.

Optimizing

There is something odd about the generated CIL instructions though. For example, instructions IL_0005 and IL_0006 seem to be redundant, as the first instruction pops a value from the stack which the latter instruction then immediately pushes back on the stack. Removing these instructions would not change the behavior of the code and would improve its performance. So how can we instruct the compiler to omit these redundant instructions?

Well, note that we built our code using the default Debug build configuration. If you build code using the default Debug build configuration, the C# compiler will not apply optimizations to the CIL being output. This leads to valid, but often inefficient CIL code. However, if we switch to the Release build configuration and compile our code again, the compiler will optimize the CIL code. Let’s see what the optimized CIL code for our example is.

CIL code for string-based example (optimized)

To see the optimized CIL code for our example, we select the Release build configuration and rebuild our code. This time, the generated CIL code looks remarkably different:

.method public hidebysig static void  DoSomething(string val) cil managed
{
  // Code size       15 (0xf)
  .maxstack  8
  IL_0000:  ldarg.0
  IL_0001:  brtrue.s   IL_000e
  IL_0003:  ldstr      "val"
  IL_0008:  newobj     instance void [mscorlib]System.ArgumentNullException::.ctor(string)
  IL_000d:  throw
  IL_000e:  ret
} // end of method Example::DoSomething

This optimized CIL code does the following things:

  • .maxstack: set the maximum number of items on the stack to 8.
  • IL_0000: push the val parameter value (the argument at index 0) on the stack.
  • IL_0001: pop the top stack value (the val parameter value) from the stack. If this value is equal to true (which is the same as: not null), jump to statement IL_000e (end of method).
  • IL_0003: push the string "val" on the stack.
  • IL_0008: pop the top stack value (the "val" string) from the stack, and push a new ArgumentNullException instance on the stack by calling its constructor that takes the popped "val" string as its argument.
  • IL_000d: pop the ArgumentNullException from the stack and throw it.
  • IL_000e: return from the method.

As you can see, the functionality is still the same, but the number of CIL instructions has been drastically reduced. This not only leads to improved performance, but also makes it easier to see what is happening in the CIL code.

CIL code for nameof operator

Let’s see what CIL code is generated for our example that uses the nameof operator. The C# code that uses the nameof operator looks like this:

public static void DoSomething(string val)
{
    if (val == null) 
    {
        throw new ArgumentNullException(nameof(val));   
    }   
}

If we compile this code (using a Release build) and run ildasm again, the following optimized CIL is generated for the DoSomething method:

.method public hidebysig static void  DoSomething(string val) cil managed
{
  // Code size       15 (0xf)
  .maxstack  8
  IL_0000:  ldarg.0
  IL_0001:  brtrue.s   IL_000e
  IL_0003:  ldstr      "val"
  IL_0008:  newobj     instance void [mscorlib]System.ArgumentNullException::.ctor(string)
  IL_000d:  throw
  IL_000e:  ret
} // end of method Example::DoSomething

Interestingly, the CIL generated for the nameof version is exactly the same as the string-based version! The nameof operator is thus just syntactic sugar for a plain CIL string - there is no nameof operator at the CIL level. This means that there is no runtime overhead to using the nameof operator over plain strings.

CIL string optimizations

As the nameof operator compiles to a plain string in CIL, do strings optimizations also apply when the nameof operator is used?

As an example of such an optimization, consider the following code:

public static void DoSomething(string val)
{
    Console.WriteLine ("Parameter name: " + "val");
}

Can you guess what CIL code will be generated for this C# code? If you figured there would be string concatenation, you’d be wrong. Let’s check the optimized CIL code the compiler outputs:

.method public static hidebysig default void DoSomething (string val)  cil managed 
{
    // Method begins at RVA 0x2058
    // Code size 11 (0xb)
    .maxstack 8
    IL_0000:  ldstr "Parameter name: val"
    IL_0005:  call void class [mscorlib]System.Console::WriteLine(string)
    IL_000a:  ret 
} //

You can clearly see that instead of using CIL instructions to concatenate the "Parameter name: " and "val" strings, the compiler just outputs the concatenated strings directly. Obviously, this compiler optimization saves both time and memory.

To find out if this optimization also applies when the nameof operator is used, we modify our example to use the nameof operator:

public static void DoSomething(string val)
{
    Console.WriteLine ("Parameter name: " + nameof(val));
}

We then recompile and inspect the generated CIL code:

.method public static hidebysig default void DoSomething (string val)  cil managed 
{
    // Method begins at RVA 0x2058
    // Code size 11 (0xb)
    .maxstack 8
    IL_0000:  ldstr "Parameter name: val"
    IL_0005:  call void class [mscorlib]System.Console::WriteLine(string)
    IL_000a:  ret 
} //

As the generated CIL is the same as that of our string-based example, we have verified that existing string optimizations also apply when the nameof operator is used.

Backwards compatibility

As we saw earlier, nameof operator calls are converted to strings in the generated CIL code; there is no nameof concept at the CIL level. This allows code that uses the nameof operator to run on older versions of the .NET framework, as far back as .NET framework 2.0 (and probably 1.0). The only restriction, of course, is that the compiler must be recent enough to know how to compile the nameof operator.

To verify this backwards compatibility, create a project that uses the nameof operator. Then, open the project properties page and change the Target framework to .NET Framework 2.0. At this point, you might get compile warnings for features not supported in .NET Framework 2.0. Remove all such features until the code compiles again. Note that the compiler did not complain about the nameof operator! Now recompile the application and run it. Everything should still work just fine.

Compiler

Previously we used a disassembler to see what CIL code was generated by the C# compiler. However, as the C# compiler (nicknamed Roslyn) is open-source, we can also examine its source code to find out how the nameof operator is compiled.

Preparing

To explore Roslyn’s source code, we first get a local copy of the Roslyn repository. We then follow the build instructions to build the master branch. Once the build script has finished (this can take a while), we open the Compilers.sln solution. In the CSharp solution folder, we then select the csc project as the startup project. This console application project builds csc.exe, the command-line version of the C# compiler. To see how csc.exe compiles C# code that uses the nameof operator, we specify the following command line arguments on the Debug tab of the csc project’s properties page:

"nameof.cs" /out:"nameof.exe" /optimize

The first argument is our C# source file that contains the nameof operator code. The compiler will compile this file and write the CIL output to the file specified in the "/out" argument. Finally, the "/optimize" flag will cause the compiler to emit optimized CIL code (note: this flag is set when doing a Release build).

Now, if we press F5 to debug the csc project, the C# compiler will compile the nameof.cs file and write the results to nameof.exe. At this point, we haven’t set any breakpoints, we just let the compiler finish. After finishing, the compiler has created the nameof.exe executable. Using ildasm, we can see that csc.exe has generated the exact same CIL code as we saw in our previous examples.

Compilation pipeline

Before we dive head-first into Roslyn’s source code to see how the nameof operator is compiled, let’s see what the Roslyn compilation pipeline looks like:

Compilation of .NET code in Roslyn

This diagram shows that Roslyn compiles C# source code in various stages. In the next sections, we’ll look at what happens to our nameof source code in each phase.

First steps

We’ll start exploring Roslyn’s internals by debugging the csc project while it tries to compile our C# code. This time though, we set a breakpoint in the project’s Main method (located in the Program.cs file). Once the breakpoint hits, we use Step Into (F11) to dig deeper and deeper into Roslyn’s source code.

At first, the compiler does some pretty mundane stuff, like parsing command-line arguments. However, things start to get interesting when we arrive at the RunCore() method in the CommonCompiler class. This method implements the aforementioned compilation pipeline. Let’s see how the nameof operator is processed in the various phases.

Parse phase

In this phase, the compiler uses the ParseFile() method to parse our source code into a syntax tree. The following is a simplified version of the syntax tree the compiler creates for our C# code:

NamespaceDeclaration   
└── ClassDeclaration   
    └── MethodDeclaration
        └── IfStatement
            └── ThrowStatement
                └── ObjectCreationExpression
                    └── ArgumentList

Clearly, the syntax tree hierarchy directly corresponds to the hierarchy in our C# code. For example, the root node is the namespace declaration, which has a class declaration as its child.

Each syntax element in the syntax tree has a number of properties, such as a reference to its parent. Some syntax elements also have an identifier property, which is a string that contains the name of that syntax element. For example the ClassDeclaration describing the Program class has its identifier property set to "Program".

Let’s zoom in on the syntax tree for the C# code that uses the nameof operator: the new ArgumentNullException(nameof(val)) expression. This time, we’ll also show the syntax element’s identifier in the tree (if it has one):

ObjectCreationExpression
├── NewKeyword
├── IdentifierName ("ArgumentNullException")
└── ArgumentList
    └── Argument
        └── InvocationExpression
            ├── IdentifierName ("nameof")
            └── ArgumentList
                └── Argument
                    └── IdentifierName ("val")

At the root of this part of the syntax tree is the ObjectCreationExpression. The object creation expression has three parts:

  1. The new keyword.
  2. The identifier name ("ArgumentNullException").
  3. The argument list.

The argument list subtree has a single child: an Argument that describes an InvocationExpression, which represents the nameof(val) argument. The syntax tree for this element has two parts:

  1. The identifier of the method to be called ("nameof").
  2. The argument list, which contains a single argument that refers to the "val" identifier.

We can see that the the nameof operator is treated as a regular method call with a single argument. That argument is not a string though, but an IdentifierName element, which is a reference to another element with a matching identifier.

We know that in our code the val argument of the nameof call refers to the DoSomething() method’s val parameter. Therefore, we expect the syntax tree of the DoSomething() method to have a parameter syntax element with its identifier set to "val", which it does:

MethodDeclaration ("DoSomething")
└── ParameterList
|   └── Parameter ("val")
└── Block
    └── IfStatement
        └── ...    

In the next phase, the compiler will match these identifiers.

Declaration and bind phase

In the declaration and bind phase, identifiers are bound to symbols. In the parse phase, we saw that the nameof call was described by an InvocationExpression. The binding of this expression is done in the BindInvocationExpression() method.

The BindInvocationExpression() method starts by calling TryBindNameofOperator(). As our InvocationExpression indeed contains a nameof operator, the BindNameofOperatorInternal() method is called to handle the binding of the nameof operator.

The following, abbreviated code shows the BindNameofOperatorInternal() method:

private BoundExpression BindNameofOperatorInternal(InvocationExpressionSyntax node, DiagnosticBag diagnostics)
{
    var argument = node.ArgumentList.Arguments[0].Expression;
    
    string name = "";
    CheckSyntaxForNameofArgument(argument, out name, diagnostics);

    ...

    return new BoundNameOfOperator(node, boundArgument, ConstantValue.Create(name), Compilation.GetSpecialType(SpecialType.System_String));
}

First, the InvocationExpressionSyntax element’s single argument is retrieved, which is of type IdentifierNameSyntax. Then the CheckSyntaxForNameofArgument() method sets the name parameter to the identifier of the IdentifierNameSyntax element. In our example, the name variable is set to "val".

Finally, the method returns a BoundNameOfOperator instance, which is the bound symbol representation of the InvocationExpressionSyntax. Note that the third constructor argument is a call to ConstantValue.Create(), with the identifier name ("val") as its argument. This method is implemented as follows:

public static ConstantValue Create(string value)
{
    if (value == null)
    {
        return Null;
    }

    return new ConstantValueString(value);
}      

By passing a ConstantValue instance to the BoundNameOfOperator constructor, we indicate that the BoundNameOfOperator symbol can also be represented by a constant value. The importance of this will become clear later.

Lowering

After the binding has finished, there is one thing left to: lowering. In the process of lowering, complex semantic constructs are rewritten in terms of simpler ones. For example, each lock call is lowered to a try/finally block that uses Monitor.Enter and Monitor.Exit.

As it turns out, lowering also applies to the nameof operator. The VisitExpressionImpl() method (in the LocalRewriter class) is called for each BoundExpression instance, including derived classes such as the BoundNameOfOperator class. It looks like this:

private BoundExpression VisitExpressionImpl(BoundExpression node)
{
    ConstantValue constantValue = node.ConstantValue;
    if (constantValue != null)
    {
        ...
        
        return MakeLiteral(node.Syntax, constantValue, type);
    }

    ...
}

This method takes a BoundExpression and also returns a BoundExpression. However, if the ConstantValue property is not null, the returned BoundExpression is of type BoundLiteral, as returned by the MakeLiteral() method:

private BoundExpression MakeLiteral(CSharpSyntaxNode syntax, ConstantValue constantValue, TypeSymbol type, BoundLiteral oldNodeOpt = null)
{
    ...
    
    return new BoundLiteral(syntax, constantValue, type, hasErrors: constantValue.IsBad);
}

Therefore, when the VisitExpressionImpl() method is called to apply lowering to bound expressions, each BoundNameOfOperator instance is replaced by a BoundLiteral instance.

Emit phase

The last phase is the emit phase. Here, the bound symbols created in the previous phase are written to file as CIL code.

From our nameof viewpoint, things start getting interesting when the EmitExpression() method is called with the BoundLiteral instance (representing our nameof call) as its argument.

As we saw earlier, the BoundLiteral instance had its ConstantValue property set to an instance of ConstantValueString (containing the string "val"). This is important because normally, the compiler evaluates an expression to determine what code to emit. However, if the compiler find that an expression’s ConstantValue property is not null, it will skip evaluating the expression but instead emit the constant value.

In the EmitExpression() method, you can clearly see this behavior:

private void EmitExpression(BoundExpression expression, bool used)
{
    ...

    var constantValue = expression.ConstantValue;
    if (constantValue != null)
    {
        ...

        EmitConstantExpression(expression.Type, constantValue, used, expression.Syntax);
    }

    ...
}

As the ConstantValue property of our BoundLiteral instance is not null, the EmitConstantExpression() method is called, which in turn calls EmitConstantValue():

internal void EmitConstantValue(ConstantValue value)
{
    ConstantValueTypeDiscriminator discriminator = value.Discriminator;

    switch (discriminator)
    {
        ...
        
        case ConstantValueTypeDiscriminator.String:
            EmitStringConstant(value.StringValue);
            break;
        
        ...
    }
}

As the ConstantValue property of the BoundLiteral instance is of type ConstantValueString, the EmitStringConstant() method is called with the StringValue property as its argument. Note that for our ConstantValueString instance, the StringValue property returns the string "val".

Having arrived at the EmitStringConstant() method, we are finally able to see how the CIL code for the nameof operator is emitted:

internal void EmitStringConstant(string value)
{
    if (value == null)
    {
        EmitNullConstant();
    }
    else
    {
        EmitOpCode(ILOpCode.Ldstr);
        EmitToken(value);
    }
}

With value not being null, the else branch is executed. First, the CIL code for the ldstr opcode is emitted. Then, the "val" string token is emitted. This will output the CIL code we were expecting:

IL_0003:  ldstr      "val"

Our tour through Roslyn’s internals has shown us how the nameof operator has been parsed to syntax elements, then bound to symbols and finally emitted as CIL code.

Conclusion

The nameof operator is easy to understand and use. While its use is limited, it does make your code more robust. As the nameof operator is just syntactic sugar, there is no runtime performance impact. Furthermore, existing string optimizations also apply to the nameof operator and the compiled code runs on older versions of the .NET framework.

To find out how the nameof operator was implemented, we used ildasm to examine the generated CIL code and stepped through Roslyn’s internals to see how the compiler generated the CIL code for the nameof operator.

 

This is the seventh and last in a series of posts that discuss the steps taken to publish our library. In our previous post, we added TypeScript support to our library. This post will show how we added our library to a Content Delivery Network.

Content Delivery Network

A Content Delivery Network (CDN) is a network of servers that deliver content based on the geographic location of the user. In other words, when you request content from a CDN, the server geographically nearest to you will send the content. The main advantage of this is speed, but another is reliability. If one server goes down, another will automatically take over. Another advantage is that your own servers use less bandwith, very useful to cut down on bandwidth costs.

Some well-known CDN providers are Akamai, CloudFlare and Amazon CloudFront. While most CDN providers are paid services, some offer basic functionality for free.

Hosted JavaScript libraries

For developers, CDN’s are often used to serve JavaScript libraries. For example, jQuery has its own CDN at code.jquery.com which hosts jQuery, jQueryUI and several others. For a larger list of libraries, you can use the Microsoft Ajax Content Delivery Network or Google’s hosted libraries. However, if they don’t host the library you want to use, you’re out of luck, right? Enter cdnjs.

cdnjs

cdnjs is a CDN that hosts many JavaScript libraries, a lot more than the aforementioned CDN’s. The great thing about cdnjs is that if they don’t already host the library you want, you can add it yourelf! Let’s do that for our library.

First we fork the cdnjs repository. In that fork, we create a new directory with our library’s name in the ajax/libs folder. Within the created folder, we add a package.json file using cdnjs’s custom package.json format:

{
  "filename": "knockout-paging.min.js",
  "name": "knockout-paging",
  "version": "0.3.0",
  "description": "Knockout paging",
  "keywords": ["knockout", "paging"],
  "homepage": "https://github.com/ErikSchierboom/knockout-paging",
  "dependencies": { 
    "knockout": "^3.2.0"
  }
}

Although the format is similar to the regular package.json format, the "filename" field is new and required.

At this point, we can start adding the files we want cdnjs to serve. To do so, we create a subfolder for the version of our library which files we want to host. Within that folder, we then put all files we want to be hosted. For our library, this gives us the following files and folders:

ajax
└── libs
    ├── ...  
    └── knockout_paging
        ├── 0.3.0
        |   ├── knockout-paging.js
        |   └── knockout-paging.min.js
        └── package.json

Note that we distribute both the regular and minified versions of our library.

The final step is to commit our changes and send it in a pull request. Once accepted, our library will be available on cdnjs. The full URL for version 0.3.0 of our library’s knockout-paging.min.js file is: https://cdnjs.cloudflare.com/ajax/libs/knockout-paging/0.3.0/knockout-paging.min.js.

Note that most of the steps to create the correct folders and files can also be done automatically using the cdnjs-importer tool.

Updating versions

To add a new version of a library, you used to create a new subfolder with that version’s file(s). Then, you’d commit and send a new pull request. However, the preferred method nowadays is to enable auto-updating. There are two ways libraries can be updated automatically:

  1. Through NPM.
  2. Through Git.

For our library, we’ll use Git. To enable auto-updating from Git, we add the following to our cdnjs library’s package.json file:

"autoupdate": {
  "source": "git",
  "target": "git://github.com/ErikSchierboom/knockout-paging.git",
  "basePath": "/dist/",
  "files": [
    "knockout-paging.min.js",
    "knockout-paging.js"
  ]
}

This will instruct cdnjs to periodically check for new versions at the specified Git repository. It does this by checking the Git tags, which should use semantic versioning. Now, if we commit the updated package.json file and submit it to cdnjs, new versions of our library wil automatically be added. Of course, old versions will remain available.

Conclusion

We made our library available through a CDN by adding it to cdnjs, which was quite simple. Furthermore, we also configured cdnjs to automatically make new versions of our library available through the use of its auto-updating feature.

And that brings us to the end of the last of our series of posts on how we published our knockout-paging plugin. Making our library available through a CDN was easy.

 

This is the sixth in a series of posts that discuss the steps taken to publish our library. In our previous post, we used build servers to automatically build and test our software. This post will show how we added TypeScript support to our library.

TypeScript

The TypeScript language is a typed superset of JavaScript that adds features like modules, classes and interfaces. The great thing of it being a JavaScript superset is that any JavaScript code is also valid TypeScript code! This allows you to gradually introduce TypeScript-specific features to your existing JavaScript code.

Let’s look at some TypeScript code:

class Greeter {
    greeting: string;
    constructor(message: string) {
        this.greeting = message;
    }
    greet() {
        return "Hello, " + this.greeting;
    }
}

var greeter = new Greeter("world");
alert(greeter.greet());

As can be seen, TypeScript allows us to use features like classes and constructors. However, this code is not valid JavaScript code (it might be in the future). Therefore, we’ll use the TypeScript compiler to convert to plain JavaScript. Our example compiles to the following JavaScript code:

var Greeter = (function () {
    function Greeter(message) {
        this.greeting = message;
    }
    Greeter.prototype.greet = function () {
        return "Hello, " + this.greeting;
    };
    return Greeter;
})();
var greeter = new Greeter("world");
alert(greeter.greet());

The compiled output is valid JavaScript code, which is quite similar to the TypeScript source. One thing that is lost completely in the translation though, are the type annotations. So what’s that about?

Static typing

One of TypeScript’s best features is that it allows you to add type annotations to your code. This makes TypeScript statically typed, as opposed to JavaScript being dynamically typed. As JavaScript does not support type annotations, the TypeScript compiler does not include them in the compiled JavaScript. This is known as type erasure.

Regardless of your stance on dynamic vs. static typing, the latter has some benefits:

  1. Bugs can be found at compile-time instead of runtime.
  2. You can (more) safely refactor code.
  3. Tooling can easily support code-completion.

You might not miss these features in small projects, but in large projects they can greatly enhance productivity. That is why the Angular team chose to write Angular 2 completely in TypeScript.

Declaration files

So how does TypeScript interact with plain JavaScript code, which doesn’t have type annotations or classes? Well, thanks to declaration files, you can use them as if they were written in TypeScript.

A declaration file is a TypeScript file, but with only types and variable definitions, the actual implementation is done in another (JavaScript) file. Let’s consider the following JavaScript code:

var Rectangle = (function () {
    function Rectangle(width, height) {
        this.width = width;
        this.height = height;
    }
    Rectangle.prototype.createSquare = function (size) {
        return new Rectangle(size, size);
    };
    return Rectangle;
})();

We could use this code from TypeScript as is, but we wouldn’t have any type information and could thus easily use it incorrectly. We can remedy this by specifying the types in a declaration file:

class Rectangle {
  width: number;
  height: number;
  
  constructor(width: number, height: number) {
    this.width = width;
    this.height = height;
  }
  
  createSquare(size: number) {
    return new Rectangle(size, size);
  }
}

If we reference this declaration file in our TypeScript code, we can then safely use the JavaScript code.

Creating a declaration file

Now that we know what declaration files are, let’s create one for our library. Declaration files must have a .d.ts extension, so lets name our library’s declaration file knockout-paging.d.ts. As our library extends the Knockout library, we start by downloading its declaration file. We’ll reference this file in our declaration file to import its types.

We are now ready to define our declaration file. First, we’ll define an interface for our paged observable array:

/// <reference path="knockout.d.ts" />

interface KnockoutPagedObservableArray<T> extends KnockoutObservableArray<T> {
    pageSize: KnockoutObservable<number>;
    pageNumber: KnockoutObservable<number>;

    pageItems: KnockoutComputed<T[]>;
    pageCount: KnockoutComputed<number>;
    itemCount: KnockoutComputed<number>;
    firstItemOnPage: KnockoutComputed<number>;
    lastItemOnPage: KnockoutComputed<number>;
    hasPreviousPage: KnockoutComputed<boolean>;
    hasNextPage: KnockoutComputed<boolean>;
    isFirstPage: KnockoutComputed<boolean>;
    isLastPage: KnockoutComputed<boolean>;
    pages: KnockoutComputed<number[]>;

    toNextPage(): void;
    toPreviousPage(): void;
    toLastPage(): void;
    toFirstPage(): void;
}

As our paged observable array is a regular observable array with added properties and functions, our interface extends Knockout’s KnockoutObservableArray<T> type. This type is defined in the previously downloaded knockout.d.ts file. To use the types in this declaration file, we reference it in our own declaration using the /// <reference path="..." /> syntax.

To define our ko.pagedObservableArray() function, we’ll have to extend the existing ko instance’s type, which is the KnockoutStatic interface. Luckily, extending an interface is as simple as defining a new interface with the same name:

interface KnockoutStatic {
  pagedObservableArray<T>(value?: T[], options?: KnockoutPagedOptions): 
    KnockoutPagedObservableArray<T>;
}

interface KnockoutPagedOptions {
    pageSize?: number;
    pageNumber?: number;
    pageGenerator?: string;
}

Here, we specify that the ko instance has a pagedObservableArray() function that takes two optional parameters. As the second parameter is actually an object, we define its allowed properties in a separate interface.

Testing the declaration file

To test our declaration file, we can create a new TypeScript files that references our declaration file. We should then use our library in every supported way, checking to see if our declaration file allows it. For our library, this looks something like this:

/// <reference path="knockout-paging.d.ts" />

// Different option formats
var emptyOptions = {};
var allOptions   = { 
  pageNumber: 2, 
  pageSize: 10, 
  pageGenerator: 'sliding' 
};

function pagedObservableArray() {
  var simple       = ko.pagedObservableArray();
  var emptyOptions = ko.pagedObservableArray([1, 2, 3], emptyOptions);
  var allOptions   = ko.pagedObservableArray([1, 2, 3], allOptions);
}

function observables() {
  var paged = ko.pagedObservableArray([]);
  var pageSize   = paged.pageSize();
  var pageNumber = paged.pageNumber();
}

function computed() {
  var paged = ko.pagedObservableArray([]);
  var firstItemOnPage = paged.firstItemOnPage();
  var hasPreviousPage = paged.hasPreviousPage();
  var pages = paged.pages();
}

function functions() {
  var paged = ko.pagedObservableArray([]);
  paged.toNextPage();
  paged.toLastPage();
}

We can then try to compile this test file using the TypeScript compiler. It should build without errors or warnings.

Note that for brevity, we left out some tests.

Including the declaration file

To make our declaration file available, we simply add it to our repository. People can then use it by referencing it from their TypeScript code.

Starting from version 1.6, the TypeScript compiler can automatically load declaration files (without explicitly referencing them). To support this, we’ll add a "typings" property to our package.json file:

"typings": "./knockout-paging.d.ts",

The declaration file will now automatically be picked up by the TypeScript compiler.

Publishing the declaration file

An alternative place where people look for declaration files is the DefinitelyTyped repository. This repository contains many declaration files, but mostly for libraries that don’t provide a declaration file themselves.

Although our library’ does provide a declaration file, it’s not a bad idea to also submit it to DefinitelyTyped. To do so, we just follow the contribution guidelines:

  1. We fork the DefinitelyTyped repository.
  2. In the fork, we create a folder with our library’s name.
  3. We add the declaration- and tests file to that folder.
  4. We compile our tests file to see if everything is valid.
  5. We commit our changes and submit a pull request.

Once the pull request has been accepted, our declaration file will have been added to the DefinitelyType repository.

Note that if we update our declaration file, we should also update it in the DefinitelyTyped repository.

Installing declaration files

Although you could manually search and download declaration files from the DefinitelyTyped repository, you can also use the TSD tool. To install it, we use NPM:

npm install tsd -g

We can now use the tsd command to install declaration files. Here is how we’d install our library’s declaration file:

tsd install knockout-paging --save

Once this command has completed, our library’s declaration file will have been saved in typings/knockout-paging/knockout-paging.d.ts.

When TSD executes a command, it modifies the tsd.json file. This file contains metadata used by TSD:

{
  "version": "v4",
  "repo": "borisyankov/DefinitelyTyped",
  "ref": "master",
  "path": "typings",
  "bundle": "typings/tsd.d.ts",
  "installed": {
    "knockout-paging/knockout-paging.d.ts": {
      "commit": "001ca36ba58cef903c4c063555afb07bbc36bb58"
    }
  }
}

The most important part is the "installed" section, which lists all installed declaration files. This allows TSD to install all typings file the project depends on just by examining the tsd.json file, similar to how the dependencies section in a package.json file is used by NPM to install any dependencies.

Conclusion

As TypeScript is becoming more popular, we created a declaration file for our library. Creating this file was fairly straightforward and gives users a type-safe way to interact with our library. Besides adding the declaration file to our repository, we also added it to the DefinitelyTyped repository.

In the next post we’ll add our library to a CDN.

 

This is the fifth in a series of posts that discuss the steps taken to publish our library. In our previous post, we added support for package managers. This post will show how we use build servers to automatically build and test our software.

Build servers

When creating a library, it is important to verify that your code also works as intended on other machines/configurations. One way to do this is by utilising a build server, which is software designed to automatically build (and often test) software.

Most build servers can be linked to a source control repository, which allows them to do automatic builds whenever a change in the source control repository is detected. This automatic building is known as continous integration.

For our library, we’ll look at two online build servers: Travis and AppVeyor. We’ll use both to do continuous integration of our library. Note that for open-source software (like our library), both Travis and AppVeyor are free of charge.

Travis

The first build server, Travis, is probably the most popular online build server. It supports a wide variety of build environments, which allows it to build software for many platforms and languages. Note that each build environment runs on Linux.

To work with Travis, the first step is to create an account. Registering is easy, as you can just use your existing GitHub account. Once registered, you’ll be redirected to your account’s page:

Travis account

Here, we click on the “+” next to “My Repositories” (highlighted in red). This will redirect us to a page that lists all your public GitHub repositories:

Travis add repository

To enable Travis builds for our library’s repository, we find it in the list and click on the button with the cross before the repository’s name. After some brief processing, the button will become green and Travis builds will have been enabled for that repository:

Travis added repository

Next, we click on the cogwheel icon adjacent to the enable button. This will show our project’s settings page:

Travis repository settings

We’ll ignore the various settings and open the “Current” tab:

Travis current build

At the moment, there are no builds for this repository. To allow Travis to build our repository, we need to create a YAML configuration file named .travis.yml in our repository’s root folder. The full list of configuration options is huge, but our project only needs a fairly minimal configuration file:

language: node_js

before_install:
  - npm install -g gulp

script: gulp ci

The first line specifies the language Travis should create a build environment for. As our library uses Node.js for building and testing, we’ll use that as our platform/language.

In the second line, we indicate that before running our actual build script, we want Travis to install Gulp in our build environment. Note that we can use NPM here, which Travis installs automatically for Node.js build environments.

Finally, we tell Travis what script to run to build our repository. For that, we use our custom gulp ci command, which uses Gulp (installed in the previous step) to build and test our library.

We then commit the .travis.yml file to version control and push it to GitHub. At this point, Travis will automatically detect the change in our repository and adds it to the build queue. After a short while, Travis will be running the build as specified by the .travis.yml file and output the build log to the screen:

Travis build

This build log shows that Travis does several things:

  • Create a worker environment.
  • Use Git to clone our repository.
  • Install Node.js.
  • Run the before_install script, which installs Gulp.
  • Run npm install to install any package dependencies.
  • Run the script script, which executes the gulp ci command.
  • Determine if the build’s success by looking at the exit code (0 means success).

From now on, each change in our repository will trigger a new Travis build. This means that we use Travis for continous integration.

It is worth noting that you’ll receive an email with the build results each time Travis has run a build.

Badge

One last thing we can now do is to include a badge in our project’s readme. This badge is an image showing the last build’s status.

To find the badge settings, click on the build status button next to the GitHub logo:

Travis badge

The badge is available in many different output formats, including the Markdown format used in our readme file:

[![Build Status](https://travis-ci.org/ErikSchierboom/knockout-pre-rendered.svg?branch=master)](https://travis-ci.org/
ErikSchierboom/knockout-pre-rendered)

After commiting the updated readme file to the repository, our project’s readme now includes the latest Travis build status:

Travis badge status

AppVeyor

The second build server we’ll look at is AppVeyor, which is a build server that runs on Windows (whereas Travis runs on Linux).

Registering is simple, as you can either create a new account or use your existing GitHub, BitBucket or Visual Studio Online account.

Once registered, you are redirected to a page with an overview of your existing projects:

AppVeyor projects

Here, we click on the “NEW PROJECT” button, which redirects us to a page where all our projects are listed:

AppVeyor projects

Having selected GitHub as our project source, we then hover over our target project and click on the “ADD” button:

AppVeyor projects

After some processing, our project’s own build page is shown:

AppVeyor projects

By default, it shows the latest build, which we don’t yet have. To build our project, we once again create a YAML configuration file in our project’s root, this time named appveyor.yml:

environment:
  nodejs_version: "0.12"

install:
  - ps: Install-Product node $env:nodejs_version
  - npm install
  - npm install -g gulp

test_script:
  - gulp ci

build: off

This build file will cause AppVeyor to do the following things:

  • Create a worker environment.
  • Use Git to clone our repository.
  • Install Node.js.
  • Run npm install to install any package dependencies.
  • Install Gulp.
  • Run the gulp ci script.
  • Determine the build’s success by looking at the exit code.

Note that we set the build configuration option to off, as we want to disable AppVeyor’s automatic build command feature, as that only works for .NET repositories.

We then commit the appveyor.yml file to our repository and push it to GitHub. AppVeyor will detect this change to our repository and add it to its build queue. After a short while, the build will be run and its output is written to the screen:

AppVeyor build log

Once again, you can clearly see that the steps that were executed reflect our YAML configuration file.

Badge

AppVeyor also supports badges, which you can find in the “Badges” section on the project’s “SETTINGS” tab:

AppVeyor badge

Once again, we’ll select the Markdown format and add it to our readme file:

[![Build status](https://ci.appveyor.com/api/projects/status/9odakh2g33mtpbm5?svg=true)](https://ci.appveyor.com/project/ErikSchierboom/knockout-paging)

Having pushed our modified readme to the repository, it now also shows the AppVeyor badge:

AppVeyor badge status

Code coverage

As our build servers also run our library’s tests, we can do one last cool thing: generate code coverage reports and automatically upload them to coveralls.io.

First, you need to create an account at coveralls.io. You can use your GitHub or Bitbucket credentials to sign up. After authentication finishes, you’ll be redirected to your projects page:

Coveralls projects

We now need to select the repository for which we want to store coverage. We do that by clicking on the “ADD REPOS” button. We will then be shown an overview of all our repositories:

Coveralls repositories

Using the search box, we search for our repository:

Coveralls repository

We then click on the “OFF” button, which will enable coverage data to be stored for that repository.

Coveralls filter repositories

When coverage has been enabled, the button becomes green and its text will change to “ON”. An additional button named “DETAILS” will also have appeared. Click on that button. This will show our repository’s page:

Coveralls active repository

As we don’t have any coverage at the moment, instructions are displayed on how to get started. The first step is to have our library generate code coverage reports.

Generating code coverage reports

Because code coverage is determined by running tests, we need to hook up a code coverage tool to our library’s test runner: Mocha. One of the best code coverage libraries is blanket.js, which supports QUnit, Jasmine and Mocha. Let’s install Blanket:

npm install blanket --save-dev

To have Blanket correctly calculate code coverage, we need to add a "config" section to the package.json file. For our library, this "config" section looks like this:

"config": {
  "blanket": {
    "pattern": [
      "index.js"
    ],
    "data-cover-never": [
      "node_modules"
    ]
  }
}

The most important part is the "pattern" key, which contains the filenames Blanket should be generating code coverage for. As our library is contained in a single index.js file, we only list that file. The "data-cover-never" key specifies the files or directories to ignore. We set it to "node_modules" to prevent any package dependencies from accidentally being analyzed.

Believe it or not, but we are now ready to generate code coverage reports! Generating the actual coverage report is done through Mocha, which is a dependency of our library and thus already installed. To create a coverage report using Mocha, we add an entry to the "scripts" section in our package.json file:

"scripts": {
  "coverage": "mocha --require blanket -R html-cov > coverage.html"
}

This new "coverage" command runs Mocha, including the blanket code coverage library. The "-R" option specifies the type of report we want to output, which in our case is an HTML report that will be written to coverage.html.

We can now create the report by executing:

npm run coverage

This will create the coverage.html file, which you can view in a browser:

Coveralls HTML report

This report lists the code coverage percentage and highlights uncovered lines in red.

Unfortunately, this HTML coverage report cannot be used as input for coveralls.io, which requires code coverage reports in a JSON-based format:

{
  "name": "example.rb",
  "source_digest": "asdfasdf1234asfasdf2345",
  "coverage": [null, 1, null]
}

To have our library submit code coverage reports to coveralls.io in the correct format, we first install the mocha-lcov-reporter library:

npm install mocha-lcov-reporter --save-dev

This library adds a reporter to Mocha that outputs coverage data in a more structured format.

Let’s add another entry to the "scripts" section in our package.json file that uses this new reporter:

"scripts": {
  "coverage-lines": "mocha --require blanket -R mocha-lcov-reporter > coverage.lcov"
}

If we run this command, the resulting coverage.lcov file looks something like this:

SF:D:Programmerenknockout-pagingindex.js
DA:9,1
DA:10,1
DA:11,0
DA:12,1
DA:13,1
DA:15,0
DA:22,1
DA:25,1

This file bears some resemblance to coveralls.io’s JSON format, but it is not quite the same. To convert this file to the coveralls.io format, we’ll need the coveralls package:

npm install coveralls --save-dev

This library can convert mocha-lcov-reporter coverage files to the coveralls.io format, which it then submits to coveralls.io.

Before we can use this library to submit our coverage results, we have to let the coveralls library know which coveralls.io repository it should send the coverage to. For this, you have to use the repository’s unique token, which can be found on the project’s page:

Coveralls token

So how do provide the coveralls library with the correct repository token? There are actually two options:

  1. Create a .coveralls.yml file that has a repo_token key which value is the repository token.
  2. Set a COVERALLS_REPO_TOKEN environment variable to the repository token.

You should only use the first option for private repositories, but as our library is open-source, we’ll use the second option.

As we want to use Travis to automatically submit our coverage reports, we first go to our repository’s settings page on Travis. There, you’ll find a section named “Environment Variables”:

Travis enviroment settings

At the moment, there are no environment variables, so we’ll click on the “Add” button. This will show two input fields: one for the environment variable’s name and one for its value:

Travis enviroment value

After we have entered the correct name and value, we click on the “Add” button to save the environment variable. From now on, any new builds will have its COVERALLS_REPO_TOKEN enviroment variable set to the value we specified.

Note that for security reasons, you won’t see the actual value on the settings page:

Travis enviroment value

The final step is to modify our .travis.yml file to submit the coverage results to coveralls.io after each successful build. To do so, we add an after_success section which does the following things:

  • Use Mocha to run our tests.
  • Run Mocha using the blanket library to detect code coverage.
  • Use the mocha-lcov-reporter reporter to output the code coverage results.
  • Pipe the code coverage results to the coveralls.js file from the coveralls package.
  • coveralls.js converts the coverage results to the coveralls.io format.
  • coveralls.js submits the converted coverage results to coveralls.io.

These steps are all done in a single command, which we add to the .travis.yml file:

after_success:
  - NODE_ENV=test YOURPACKAGE_COVERAGE=1 ./node_modules/.bin/mocha --require blanket --reporter mocha-lcov-reporter | ./node_modules/coveralls/bin/coveralls.js

We then commit our modified .travis.yml file and push it to GitHub. This will cause Travis to do a new build of our repository, but this time it will also submit the coverage results to coveralls.io.

Once the build has completed, we can see the coverage results on our repository’s coveralls.io page:

Coveralls coverage

We can drill down on the coverage for an individual build by clicking on the build number. This will show the coverage details for that build:

Coveralls coverage details

One very handy feature is that by clicking on one of the covered files, that file will be displayed and each covered line will have a green background, whereas each uncovered line will have a red background:

Coveralls lines coverage

And now we have setup Travis to automatically calculate our library’s code coverage and submit it to coveralls.io.

Conclusion

For our library, we wanted to ensure maximum compatibility. To this end, we setup the Travis and AppVeyor build servers to automatically build and test our library whenever the repository is modified.

Both Travis and AppVeyor were easy to setup, requiring a small number of clicks to enable automatic builds for our repository as well as creating a configuration file specifying how builds should be run. Furthermore, as Travis runs on Linux and AppVeyor on Windows, we automatically test different platforms.

Setting up automatic code coverage reports was slightly more work, but also not that difficult using the free coveralls.io service.

In the next post we’ll add support for TypeScript.