The Friendly Coder

On software development and technology

MSBuild Gotchas: Task Batching

Look at the following two statements: <Message Text=”Value = ‘@(SourceFiles)”/> and <Message Text=”Value = ‘@(SourceFiles)” Condition=”%(SourceFiles.identity)!=””/>? If you don’t know why these two statements give different outputs then you need to read on.

The What

Lets take the following simple example project:

<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
    <Target Name="Sample1">
        <ItemGroup>
            <SourceFiles Include="file1"/>
            <SourceFiles Include="file2"/>
        </ItemGroup>

        <Message Text="File list is '@(SourceFiles)'" />
        <Message Text="File list is '@(SourceFiles)'" Condition="%(SourceFiles.Identity)!=''"/>
    </Target>
</Project>

To refresh your memory the @ operator converts an item group into a string representation by concatenting all of the identity attributes of all the items together, separated by semicolons. Also the % operator is the MSBuild batching operator indicating to MSBuild that an iterative operation (i.e.: a loop) needs to be performed.

Now, even if you’ve never seen MSBuild before you are likely inclined to think that both calls to Message will show the same output – but you would be wrong. Go ahead, copy and paste this snippet into a text file and run MSBuild on it. No really, try it. You’ll be surprised at what you find.

Fine – seeing as how you won’t try it yourself I’ll show you what the output is:

File list is ‘file1;file2’
File list is ‘file1’
File list is ‘file2’

What! That can’t be right! That makes no sense at all! Welcome to the world of MSBuild, where the only way to keep yourself out of trouble is to actually run the script and see what magic MSBuild performs behind the curtain. I assure you this is expected behavior as I will show.

The How

When MSBuild parses a statement for execution, one of the things it looks for is the batching operator. If it finds it within the statement it performs the following steps:

  1. Compile a list of items to loop over
  2. For each item in the list do the following:
    1. Create a temporary item group including a single element, namely the one currently being processed
    2. Call the task substituting the appropriate item references with the values in the temporary item group

This flow is so important I’ll reiterate the fact there are two key points you must remember about task batching:

  1. Item references are substituted with values from a generated list of items and not the original list
  2. If the batching operator is used anywhere in a task call, MSBuild will pre-process all items referenced in the statement

So, each time the second task is executed it receives an item group with a single entity. Then, when the @(SourceFiles) reference is replaced with it’s current value it gets a single filename.

It is also worth noting here that the following two statements are functionally equivalent in msbuild:

<Message Text="File list is '@(SourceFiles)'" Condition="%(SourceFiles.Identity)!=''"/>
<Message Text="File list is '%(SourceFiles.Identity)'" Condition="%(SourceFiles.Identity)!=''"/>

The Why

So why on earth would Microsoft purposely do such a thing? Well, one can never be certain why things are the way they are, but we can make an educated guess. I suspect that to do otherwise they would have had to treat each item that is referenced in the statement as a separate entity in order to achieve the more intuitive result here. This would allow the @(SourceFiles) reference to be updated independently from the %(SourceFiles.Identity) reference.

However, if they were to treat each item reference independently you would get other unexpected results that would likely be even less intuitive than what we’ve seen above. Take the following project for example:

<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
    <Target Name="Sample1">
        <ItemGroup>
            <SourceFiles Include="file1">
                <TargetFolder>c:\foo</TargetFolder>
            </SourceFiles
            <SourceFiles Include="file2">
                <TargetFolder>c:\bar</TargetFolder>
            </SourceFiles
        </ItemGroup>

        <Message Text="Copying file %(SourceFiles.Identity) to %(SourceFiles.TargetFolders)" />
    </Target>
</Project>

Here I think the most intuitive output is something like this:

Copying file file1 to c:\foo
Copying file file2 to c:\bar

Which is, coincidentally, the output we actually do get from this target. However, if each item reference is managed separately, and substituted independently from the rest, we would end up with output that looks something like:

Copying file file1 to c:\foo
Copying file file1 to c:\bar
Copying file file2 to c:\foo
Copying file file2 to c:\bar

And I think, at least to me, this result is much less intuitive than the actual behavior.

Call to action

By experimenting with small sample scripts such as this we allow ourselves to gain a clearer understanding of how the internals of MSBuild really work. I encourage you to actually try the snippets above on your own PC, trying different combinations of items and batching patterns. If you do discover particularly confusing or interesting examples of how batching has caused you problems I would like to hear about them. Post a comment with the details below – I’d love to hear from you.

Leave a Reply