How to Use Grouping in C# LINQ Syntax


When you started using LINQ, did you think it looked like SQL? I did.

The more I learned LINQ, the more I realized it wasn’t anything like SQL. Take grouping, for example. Because LINQ has a group by statement, and it looks like SQL, I assumed that the syntax for grouping in LINQ would be just like SQL. Ha ha! Wrong! As soon as I tried to use it, I discovered that the LINQ syntax is not only nothing like the SQL equivalent, but the whole grouping concept in LINQ is completely different too.

At first glance, the two syntaxes look slightly similar.

SQL:

select ReportsTo, count(LastName) as NameCount
from Employees
group by ReportsTo

LINQ (C#):

from employee in Employees
group employee by employee.ReportsTo

Ignoring the omission of the select statement from LINQ, and the requirement of a range variable, they do look similar. But looks can be deceiving.

Let me explain why.

Grouping in SQL

In SQL, you typically group a set of entities using the following syntax:

select ReportsTo, count(LastName) as NameCount
from Employees
group by ReportsTo

This kind of query results in a simple table with two columns: ReportsTo and NameCount. The count of the names was grouped into a single column using an aggregate function.

Of course, this is a simple example; you could do much more complex grouping by using more expressions in the aggregation or the group by statement. Whatever the grouping though, the result is a flat table.

LINQ is different. Completely different.

Grouping in LINQ

Before you can understand the LINQ syntax, you have to understand how grouping works in LINQ.

When you use the group by statement, the C# compiler translates it to the LINQ Enumerable’s GroupBy extension method. The GroupBy method returns an object of type

  IEnumerable<IGrouping<TKey, TSource>>

The GroupBy method takes a parameter of a delegate which specifies the key of each grouping. The key is defined to be of type TKey, and is usually inferred by the compiler depending on the expression.

The key to understanding the C# query syntax is that you don’t get back a flat structure from group by, as you would with SQL. You actually get an “IEnumerable” of “IGrouping” objects.

The IGrouping interface has only one property, “Key”, and implements IEnumerable<TElement>. Again, TElement is usually inferred. “Key” is typed as the generic type you define for TKey. For the minute, that’s not so important, so we’ll come back to the Key later.

One of the first indications that LINQ is different to SQL is that you discover it is possible to finish a LINQ statement with a group by statement without needing a Select. As IGrouping implements IEnumerable, group by actually returns an IEnumerable of IEnumerable. The result is a set of sets. It’s not a flat table. To use it, you have to iterate over it again or perform another operation on the set.

Let’s see if an example can make it clearer.

When grouping a set of employees, you might use the following code:

var emp1 = new { ReportsTo = 1, LastName = "Richard" };
var emp2 = new { ReportsTo = 1, LastName = "Christopher" };
var emp3 = new { ReportsTo = 2, LastName = "John" };
var emp4 = new { ReportsTo = 2, LastName = "Greg" };
var employees = new[] { emp1, emp2, emp3, emp4 };

var groups =
  from employee in employees
  group employee by employee.ReportsTo;

foreach (var group in groups)
  Console.WriteLine("{0} employees report to {1}.",
    group.Key, group.Count());

(Don’t worry about the anonymous class definitions for now. You just have to know that my employee objects are all treated as the same type internally, so this is valid C# code.)

The employees variable is of type IEnumerable, so it can be used in a LINQ statement. The groups variable is also an IEnumerable, as it is a set of results, as usually returned by a LINQ statement. Remember though, it’s not like the set you would get back in a SQL statement. It is actually an IEnumerable of IGrouping, or an IEnumerable of IEnumerable.

To do something with the grouping, you have to either treat it as another IEnumerable, and iterate over it, or use methods like Count() to do something with it.

foreach (var group in groups)
  foreach (var employee in group)
    Console.WriteLine("{0} reports to {1}",
      employee.LastName, employee.ReportsTo);

This is how LINQPad shows what is contained in the groups variable:

IEnumerable<IGrouping<Int32,>>

Key=

1

 

IGrouping<Int32,> (2 items)

ReportsTo

LastName

1

Richard

1

Christopher

Key=

2

IGrouping<Int32,> (2 items)

ReportsTo

LastName

2

John

2

Greg

Notice that each IGrouping has an associated Key property of type Int32.

Grouping Syntax in LINQ

So now it should be a little easier to understand how to use the LINQ group by statement.

In LINQ, you always need to declare the range variable between group and by.

from employee in Employees

group employee by employee.ReportsTo

After by, you give an expression which returns the value for the Key. The enumeration is then split into groups of IGroupings where the key for each item in that IGrouping is the same.

So here, we specify the employee.ReportsTo variable as the key. Each employee with a different value for ReportsTo creates a new IGrouping with a Key of that value. The employee is then added to that new IGrouping as an item. The IGrouping is then added to the returned IEnumerable<IGrouping<>>.

The next time an item in the employees variable has the same value for ReportsTo as the Key property of an already-existing IGrouping object, the item is simply added to that IGrouping.

Of course, that’s probably not the way it works under the covers. But I’m not trying to specify the algorithm, just what happens in the end result.

In the end, an object implementing IEnumerable<IGrouping<>> is returned by the LINQ statement. As each item in the IEnumerable is an IGrouping, you can either use it’s Key or treat it as an IEnumerable.

So there you have it. Just remember, group by is nothing like SQL, and it returns an IEnumerable of IGrouping, and you should stay happy.

At least until you meet SelectMany… :=)

If you want to learn more about grouping in LINQ, take a look at chapter 9 of the excellent C# 3.0 in a Nutshell: A Desktop Quick Reference (In a Nutshell (O’Reilly)), by Ben and Joseph Albahari. The chapter has some great explanations and examples. In particular, they have an example of grouping with multiple keys, which I found very interesting.

  • Do you know how to get these results in VB?

    I’m trying to use LINQ syntax but I get IEnumerable’s of anonymous types. I’m trying to translate this query:
    source.GroupBy(Of Boolean)(Function(x) x.MyBoolProperty).
    In C# it should be smth like
    source.GroupBy(x => x.MyBoolProperty).

    Perhaps you should mention that not all GroupBy overloads return IEnumerable of IGrouping..

    ulu

  • pretty good article … you should go on explain all these Linq stuff.
    Thanks, Chris

  • Thanks for writing it up. I was also initially very confused by expecting this to work in a similar way to grouping with T-SQL.

  • nice article..

    in sql we usually use “select * from table where field like ‘%%’”

    so, Could i use syntax “like” in linq?? thanks ^^

  • Excellent article. Thanks

  • brilliant article i was going nuts trying to understand the group by until i read this!

    cheers

  • Thanks for the article. Some of the other linq features frustrated me, so I decided to read the manual before using this one.

  • Thanks! Very useful!

  • Excellent! I’ve just started trying to do come more complex reporting in LINQ instead of T-SQL, and its actually working thanks to your article. ;D

  • evlin, like is actually a subquery that returns a boolean value. In LINQ you can use any function that does that in the where function/clause. To replicate LIKE %mr%, for example, you can do
    from l in list
    where l.Title.Contains(“mr”)
    select l.Title

    but it’s much more flexible than that.

  • Great explination, clear english ;-) butt the..

    Code is wrong )-; [group.Key, group.Count()]
    should be [group.Count(), group.Key]

    1 employees report to 2?

    var emp1 = new { ReportsTo = “Jim”, LastName = “Richard” };
    var emp2 = new { ReportsTo = “Jim”, LastName = “Christopher” };
    var emp3 = new { ReportsTo = “Bob”, LastName = “John” };
    var emp4 = new { ReportsTo = “Bob”, LastName = “Greg” };
    var employees = new[] { emp1, emp2, emp3, emp4 };

    var groups =
    from employee in employees
    group employee by employee.ReportsTo;

    foreach (var group in groups)
    Console.WriteLine(“{0} employees report to {1}.”, group.Count(), group.Key);

You can follow any responses to this entry through the RSS 2.0 feed.