Monday 26 October 2009

Parallel Extensions in .NET Part 2

In the last post I covered the Co-ordination Data Structures in .NET 4.00, so now I will examine the Task Parallel Library and Parallel LINQ (PLINQ).

As I mentioned in the last post, Mike Taulty explained that the thought behind the Parallel Extensions is to abstract the multi-processor jobs away from the concept of threads.  The Task Parallel Library contains objects that allow you to achieve this.

At the heart of this is the Task object which represents a unit of work to be carried out.  There is a factory version (as shown below) and a standard object version of this class.


Task myTask = Task.Factory.StartNew(() =>
{
    //code to be carried out
});


myTask.Wait(); //wait for task to complete

Which ever method you use to construct the object it is passed a predicate which contains the work to be carried out.  Tasks are atomic operations that can be carried out on any of the processors in the PC.  The Wait() command is only applicable if you need to wait for the result of the Task

An interesting feature of Tasks is that a Task can start off other Tasks within them.

Task parentTask = Task.Factory.StartNew(() =>
{
    Task childTask = Task.Factory.StartNew(() =>

    {
       //code to be carried out
    });
    //other code to be carried out

});

These child classes are automatically linked to the Parent class, so that if you call Wait() on the parent class then this will wait for any child Tasks to finish.  This default behavior can be turned off (if required) in an extra parameter passed into the constructor.

Another intriguing feature about these Tasks is that when an exception occurs within the main Task or one of it's children, the exception thrown is a AggregateException.  This exception contains a new property called InnerExceptions that contains any exceptions that are thrown by the children.  So, if three child task are spawned and all of these throw an exception you can view all of these in this AggregateException.  Nice.

Mike also introduced the Parallel static class which provides an alternative way of spawning multiple Actions.  It has handy methods such as Parallel.ForEach that allows a number of actions to be thrown a number of times.  I won't go into too much detail about this one, but if you look on the recently published MSDN for .NET 4.00 there are plenty of examples of it's use.

Finally, Mike introduced use to Parallel LINQ, otherwise known as PLINQ!  This is a command that can be used on LINQ queries to make them execute across multiple processors.  It's purpose is to make LINQ queries on any large data sources more efficient.

PLINQ is very easy to use, all you need to do is utilise the AsParallel() at the appropriate place in the LINQ query.  For example:

var results = myData.AsParallel().Where(x => x.Name == "World").Count();

It's that easy!

That concludes my whirlwind tour of the Parallel Extensions in .NET 4.00.  There are plenty of other resources on the MSDN and various microsoft blogs on Parallel Extensions.  Here are a few:

Introduction to PLINQ
Microsoft Parallel Computing Center
Microsoft Parallel Programming Team Blog
Mike Taulty's Blog

Thursday 22 October 2009

Parallel Extensions in .NET 4.00 Part 1

Last night I attended an NxtGen User Group in Manchester at which Microsoft guru Mike Taulty was giving an introduction to the new parallel programming components of the .NET framework.

Whether we like it or not we, as developers, are going to have to adapt to using techniques that will run code on separate processors. The new chips from Intel and AMD are no longer continuing to increasing in Ghz, they are just containing more processors. So, we need to be able to utilise this power.

The good news is that the .NET CLR and certain components, such as WCF and WF, already spread their workload across multiple processors. But the bad news is that our own programs do not.

So, how would you currently handle this? The answer is by using a Threads or a ThreadPool. But as Mike demonstrated this is quite long winded and makes the code quite difficult to read.

The idea behind the Parallel Extensions in .NET 4.00 is to abstract the concept of the work that can be carried out on multiple processors away from the underlying concept of threads. They have split the components up into the following three categories:
  1. Co-ordination Data Structures.
  2. Task Parallel Library
  3. Parallel LINQ (PLINQ)
Lets start with the Co-ordination Data Structures. This category contains all the replacement data structures that are required to support parallel programming. If you have ever performed any thread safe work that involves a collection, you will know that none of the collections in .NET are thread safe. The only solution is to lock the entire collection when carrying out any work. Which frankly is a bit rubbish.

The co-ordination data structures contain a selection of Concurrent Collections to get around this issue. These include ConcurrentLinkedList and ConcurrentQueue. It also includes some advanced Co-ordination helpers that include a Barrier (blocks work until a x number of threads reach) and a CountdownEvent (a concurrent countdown that can be signalled from multiple threads).

The last of the co-ordination data structures is the Data Synchronization classes. The main one of interest is the Lazy class. Great name, but what does it do? It allows you to create a thread safe Singleton class. For example, say we have a singleton class called MyClass which has a method called MyMethod(). Initialisation of the singleton can be complex as we may have several threads entering the initialisation code. The solution is to use the Lazy class to wrap the class like so:

Lazy myInstance = new Lazy<MyClass>();
myInstance.Value.MyMethod();

This ensures that ONLY one instance of this class will exist.

In the next part I will cover the Task Parallel Library.

Tuesday 20 October 2009

Creating comma separated strings with LINQ

Here is a really quick LINQ based solution to the conversion of a collection of strings into a comma separated string:

//Prepare list
List myList = new List();
myList.Add("aString");
myList.Add("anotherstring");

//Get comma separated string

var stringBuilder = myList.Aggregate(new StringBuilder(), (builder, stringValue) =>
builder.AppendFormat((builder.Length == 0) ? "{0}" : ", {0}", stringValue));
string commaString = stringBuilder.ToString();