04 February 2008

Compiler Optimizations May Be Dangerous

Consider the following code:


static class Test
{
public static IEnumerable<int> Select<T>(this IEnumerable<T> list, Func<int, int> f)
{
foreach (var item in list)
yield return f(item.GetHashCode());
}

public static IEnumerable<bool> Where<T>(this IEnumerable<T> list, Func<int, bool> f)
{
foreach (var item in list)
yield return f(item.GetHashCode());
}

static void Foo(IList<string> list, IList<string> list2)
{
var lengths = from hashCode in list
where hashCode * 2 > 10
select hashCode;
}
}


This code doesn't use System.Linq extensions methods to do LINQ, and instead declares own Select and Where methods, which are rather crazy. This code compiles, but what is the type of "lengths" variable? It should be IEnumerable<int>, because that is what Select method returns. But if you look at actual compilation output, it turns out that compiler decided "select hashCode" to be very simple (actually, transforms into identity lambda hashCode => hashCode). And it removes the call to Select() method. And breaks my code. One can have some hard time understanding why their code doesn't work, or even doesn't compile.

I can force compiler to call my Select() method by surrounding hashCode in select clause with parenthesis:

var lengths = from hashCode in list
where hashCode * 2 > 10
select (hashCode);


It is not trivial to compiler anymore, and thus it emits correct code.

UPDATE: This behaviour is by design. See 7.15.2.5 Select clauses:

A query expression of the form
from x in e select v
is translated into
( e ) . Select ( x => v )
except when v is the identifier x, the translation is simply
( e )


I still find this very error prone...

3 comments:

Lucas said...

I think this is understandable. For LINQ comprehension syntax to work, he compiler must expect specific behavior from the extension methods it translates the syntax into, such as Select() and Where(). Where() should filter items, and Select() should project them. If you project x => x, you're not really projecting at all, so you can save yourself a delegate method call for each item. Think of it as: Select() is called upon to project data, if projection is required.

If your Select() and Where() are not doing what they're supposed to... well that's like having your Add() method remove items and writing this:
MyList tempList = new MyList { 1, 2, 3 };
which translates to 3 calls to Add()

in conclusion, work with the compiler, not againt it :P

Unknown said...

Totally agree with Ilya and disagree with lucas. Obviously the place to perform this "optimization" is in the extension method implementing Select, if necessary, not in the compiler. Then every implementation of Select gets to decide for itself what to do, rather than having the compiler force a particular behavior.

Lucas said...

> rather than having the compiler force a particular behavior

But you see, the behavior of Select() *is* already defined. It should transform (project) data without modifying anything else. That is, it should not leave any "side effects". If implemented correctly, transforming a value into itself will be equivalent to not transforming it. This is not even an optimization, you just told the compiler you don't want to project data, so Select() was not used.

This example could have been written like this (which also avoids calling GetHashCode() twice on every item):

var lengths = from item in list
let hashCode = item.GetHashCode()
where hashCode * 2 > 10
select hashCode;