Rewriting C# with Roslyn

Marcus • June 09, 2020

An intricate clock design, golden lines and symbols on a dark background.

Imagine your source code, if only for a moment: It’s not perfect, but you have ideas on what to change to improve it, or at least make it less of a pain to work with.

What’s better than fixing all those issues by hand? Let’s look at how to leverage Roslyn to modify C# code entirely automatically.

Peeking into Code

The .NET Compiler Platform, also known as Roslyn, contains a varied set of tools to read, analyze and modify C# source files: In fact, one of the most commonly developed-for use cases is writing own analyzers that can identify code issues & suggest quick fixes.

And the first thing we’ll peek at is understanding how our existing code is actually structured. The following two tools are good for a first glance at understanding the problem:

That’s it for the tooling part, for the time being anyway: It’s enough for a quick start, and any time you’re wondering why a certain expression isn’t handled properly through the code you write, looking into either of them can vastly help with finding out why things fall apart.

Project Setup

Before we can work with our nodes, we need to set up a simple workspace. There’s the option of either an AdhocWorkspace that you can manually add files to, or a MSBuildWorkspace which can load project files & solutions.

Here’s a pretty straightforward way to set up your workspace:

MSBuildLocator.RegisterDefaults();
using var workspace = MSBuildWorkspace.Create();
workspace.WorkspaceFailed += (sender, e) =>
  Console.WriteLine($"[failed] {e.Diagnostic}");
var solution = await workspace.OpenSolutionAsync("path/to/your.sln");

There’s about a reasonable chance loading your solution will fail on the first, second and third attempt, and the WorkspaceFailed event is triggered for each diagnostic message produced. This may just be warnings, but there’s a likely chance that some dependencies are missing.

To analyze a pretty standard C# + .NET Core project, I’ve added the following references to my project file:

<!-- needed so any documents are loaded, vb would use diff. dep -->
<PackageReference
  Include="Microsoft.CodeAnalysis.CSharp.Workspaces"
  Version="3.6.0" />
<!-- needed so project files are loaded -->
<PackageReference Include="NuGet.Frameworks" Version="5.6.0" />
<PackageReference Include="NuGet.Packaging" Version="5.6.0" />
<PackageReference Include="NuGet.ProjectModel" Version="5.6.0" />
<PackageReference Include="NuGet.Versioning" Version="5.6.0" />

In fact, that may still not work - I’m not quite sure, but for the project I was working with, it’s dual-targeted towards supporting both .NET Framework and .NET Core. Having both of these enabled didn’t work out of the box, and I assume it’s to support conditional options in *.csproj files and in #if statements, which could build entirely different code for different platforms[1].

The workspace allows you to pass custom options though, so just limiting it to exactly one works well enough, modifying the Create() call as follows:

using var workspace = MSBuildWorkspace.Create(
  new Dictionary<string, string>{
    {"TargetFramework", "net48"},
    {"TargetFrameworks", string.Empty}
});

And with that preliminary problem solved, we can look into what files are part of the solution we just loaded:

foreach (var document in solution.Projects
    .SelectMany(x => x.Documents)) {
  Console.WriteLine(document.FilePath);
}

And if your files are listed here, that’s presumably all that’s needed to work with them in the following steps.

Analyzing syntax elements

Syntax nodes are a fundamental building block in analyzing code: There’s a tree of nodes per document, and there’s tools to iterate over them. They are immutable, though: You cannot modify an existing node in place, so either you create entirely new nodes using the SyntaxFactory, or you create a new instance by calling .WithXXX() on an existing node.

There are two vaguely similar ways of how to walk through your forest of syntax trees: You could read it, or you could read and modify it, expressed by creating either a CSharpSyntaxWalker[2] or a CSharpSyntaxRewriter. While not having tried it, I would assume that a Roslyn Analyzer would implement the former to show you places where fixes can be applied, and then use the latter if you press the button to fix it.

Their interfaces are very similar, the former just has no return values for the nodes you visit, so modifications are unsupported. If we had a class TestRewriter, we would modify our loop over all files above as follows:

foreach (var document in solution.Projects
    .SelectMany(x => x.Documents)) {
  var root = await document.GetSyntaxRootAsync()
    ?? throw new Exception($"No syntax root - {document.FilePath}");

  var result = new TestRewriter().Visit(root);
  if (!result.IsEquivalentTo(root)) {
    await File.WriteAllTextAsync(document.FilePath,
      result.ToFullString());
  }
}

That works just fine for demonstrating the basic principle. In practice you may want to adjust for the encoding you’re using (which is presumably UTF-8) - an example that I encountered is ensuring the UTF-8 byte order mask is kept.

There are a few snippets that I’ve written in the past, to give you an example:

Example: Modifying Comments

There’s dozens if not hundreds of different methods to implement in a rewriter, so here’s one I found at least slightly interesting.

Imagine, if you will, that it is common in your company that TODOs and FIXMEs include the name of whoever was writing a comment (or whoever assigned it to themselves). That’s a step up from not having either an author or assignee: You know who to ask if in doubt.

However, one of your developers has left, and you want to update the comments they left behind[4].

Here’s an example file that we wish to modify:

public class Demo
{
  public void DoStuff()
  {
    // TODO(first): Reduce to 50 once the rest of the logic is there
    Thread.Sleep(1000);

    /* TODO(second): This implementation is suboptimal.
     * I've looked into it in the past, but haven't had the time
     * to properly test a new solution. */
    throw new NotImplementedException();
  }
}

If we inspect the code above, we learn a few things: There’s plenty of tokens in the code above, and, the comment we’re looking for is the single element inside a TriviaList. Trivia is, essentially, the part of the code that isn’t particularly significant for the compiler – needed for reproducing an exact source file, but not part of the syntax tree itself: Whitespace outside of strings, comments, new lines, and preprocessor directives[5].

public class TestRewriter : CSharpSyntaxRewriter
{
  private readonly string _newAuthor;
  private readonly Regex _originalAuthor;

  public TestRewriter(string originalAuthor, string newAuthor)
  {
    _originalAuthor = new Regex(
      $@"(TODO|FIXME)\\({Regex.Escape(originalAuthor)}\\)",
      RegexOptions.IgnoreCase);
    _newAuthor = newAuthor;
  }

  public override SyntaxTrivia VisitTrivia(SyntaxTrivia trivia)
  {
    if (trivia.IsKind(SyntaxKind.SingleLineCommentTrivia) ||
        trivia.IsKind(SyntaxKind.MultiLineCommentTrivia))
    {
      string text = trivia.ToFullString();
      if (_originalAuthor.IsMatch(text))
      {
        return SyntaxFactory.Comment(_originalAuthor.Replace(text,
            $"$1({_newAuthor})"));
      }
    }

    return base.VisitTrivia(trivia);
  }
}

You can view the full source code on GitHub.

Up next

In part 2, we’ll look at expanding our understanding our code. Aside from the syntax tree, we also have a semantic model to cross-reference, which allows us to understand the (then fully qualified) calls.

Part 2 ->


  1. For my specific case, .NET 5.0 will unify the platform enough to no longer need two different platforms; but for apps supporting Android/iOS/Mac, it’ll stay a problem.
  2. There’s also a CSharpSyntaxVisitor that’s similar to a walker, but does not iterate recursively over the nodes. That’s suboptimal for the approach in this article. You can, however, use it with root.DescendantNodes and related methods.
  3. Incidentally, GetString() just returns an empty string if the resource can’t be found. From looking at the code, this clearly wasn’t intentional.
  4. Obviously, search & replace is a thing, but there’s more to this: You can search arbitrarily formatted comments (with regexes), and execute any C# code to edit it. Want to ask git blame for the author of a line of code? Include libgit2sharp and you’re good to go. Want to query the database? File system? Active Directory? It’s all available.
  5. Preprocesor directives are very relevant to creating syntax trees, since they allow for modifying how your code is compiled. You can use them to enable code conditionally based on platform, etc. That said, it is evaluated during compilation – The runtime will never know.
}