Tuesday, August 31, 2010

Parallel Extensions


The new Task Parallel Library included in .NET4 is an incredibly easy way to parallelise processing, that would otherwise have to be done with such devices as SpinLock, Semaphores, Monitors and others.  Unfortunately those previous devices were crazy easy to get wrong, and remember how they worked after 6 months has elapsed.

One of my favourite sessions at TechEd this year showed in some detail how to Parallise various code fragments using the new constructs.  

Parallel Loops

Loops are the easiest place to start (shortly followed thereafter with reconsidering all LINQ statements).

namespace TPLTest1
    using System;
    using System.Threading;
    using System.Threading.Tasks;

    public static class Program
        public static void Main(string[] args)
            Action a1 = () =>
            Action a2 = () =>
            Action a3 = () =>
            Action a4 = () =>
            Parallel.Invoke(a1, a2, a3, a4);

            Parallel.For(0, 4, index =>
                    Console.WriteLine("Enumerator " + index);
            Console.WriteLine("Finished For");
The handy thing about these new For constructs are the threads will be synchronised back into the main thread after the loop. Here's the output:
Enumerator 2
Enumerator 1
Enumerator 0
Enumerator 3
Finished For
Press any key to continue . . .


To be able to effectively and safely filter a collection and copy results into a new collection you used to have to do something like this:

IEnumerable<RaceCarDriver> drivers = …;
var results = new List<RaceCarDriver>();
int partitionsCount = Environment.ProcessorCount;
int remainingCount = partitionsCount;
var enumerator = drivers.GetEnumerator();
try {
    using (var done = new ManualResetEvent(false)) {
        for(int i = 0; i < partitionsCount; i++) {
            ThreadPool.QueueUserWorkItem(delegate {
                while(true) {
                    RaceCarDriver driver;
                    lock (enumerator) {
                        if (!enumerator.MoveNext()) break;
                        driver = enumerator.Current;
                    if (driver.Name == queryName &&
                        driver.Wins.Count >= queryWinCount) {
                            lock(results) results.Add(driver);
                if (Interlocked.Decrement(ref remainingCount) == 0) done.Set();
        results.Sort((b1, b2) => b1.Age.CompareTo(b2.Age));
finally { if (enumerator is IDisposable) ((IDisposable)enumerator).Dispose(); }
Now you do this:

var results = from driver in drivers
              where driver.Name == queryName &&
                    driver.Wins.Count >= queryWinCount              
              orderby driver.Age ascending
              select driver;
Crazy easy.
One of the few options you might need to consider when using PLINQ is the partitioning algorithm used.  Here's a great slide from Ivan Towlson session at TechEd NZ showing the different algorithms in action:
I believe the Chunking algorithm is the default, and usually is an ok choice for most things. Except if you are looking for a certain grouping of data, for example searching a list of people and processing the first one of a series of duplicates, then you should use the Hash algorithm.


Best described by two more slides from the same session:

Sunday, August 15, 2010

Event De-registration and Memory Leaks


I think its pretty safe to say we would all recommend deregistering/unsubscribing events when we are finished with them.  I know I was told when learning about events years ago, that if you do not, memory leaks would result.  After looking into the GC recently I thought it might be interesting to get some specifics around this. 

So, what happens when you don't unsubscribe from events and just pray the GC can figure it out?  Unsurprisingly some objects stay alive a lot longer than they should.  

The problem can be summarised as follows:
Events work by giving a delegate to the Source object for it to call when an event occurs.  The delegate is tightly bound with a strong reference to the listener. Most of the time this works fine.  A sloppy developer might never unsubscribe the events they register for, this isn't good but its not necessarily catastrophic.  The listener object cannot be garbage collected until the source object is also a candidate for collection (again assuming no event unsubscription otherwise it would be).  Eventually the source object will be garbage collected and then so will the Listener.  Not timely release of memory but at least not technically a memory leak. 

A memory leak, at least IMHO, is when objects are never removed from memory and the memory is not reclaimed until the process is terminated. So not technically a memory leak right?  Well it can be if your source object is a singleton or static effectively making it immune to garbage collection along with any object who subscribed to its events. Fortunately, in .NET memory leaks are far less frequent than in unmanaged code.

Lets take a look at a demo of good and bad event handling.
The application is made up of a Shell Controller and up to four child panel controllers.  The Shell Controller is obviously going to be long lived even though it is not static, it is controlling the root of the application and therefore will not garbage collected until the process closes. Clicking "Add" will add a new Panel to the shell controller and the UI.  When the Panel is added, it subscribes to the Shell Controller's Ping event.

The Panel starts to receive the Ping event from the Shell Controller and displays the event as text in a list box.
When the "X" buttons are clicked on each panel the panels are removed from the UI and all references to them are destroyed.  This effectively means they should be candidate for collection right? No, because I did not unsubscribe to the Ping event for each Panel.  Some developers think by adding code to the Finalizer (destructor) it will deregister the event when the GC collects it.  No, wrong again. The GC calls the Finalizer and the object isn't a candidate for collection until the event is unsubscribed.

As you can see in the above image the number of alive objects are currently 3.  This is incremented when a new Panel is created and is only decremented when a Panel Finalizer is called, meaning the panel is being collected.

Here I have removed two panels and clicked the button "ForceGC" to force an immediate garbage collection.  As you can see the Alive objects still reads 3.  Meaning no Finalizer has been called to decrement the Alive Objects counter.

The test application runs in two modes "Sloppy mode" or "Best Practice mode"  this is designated by the toggle button.  When the toggle button is depressed in, the application will unsubscribe from the Ping event when a panel is closed.
3 Panels created, the counter reads 3.  Lets close 2.

After removing 2, and clicking force GC, the counter immediately changes to 1. Excellent.  Incidentally, if you wait 1-2 minutes it will collect by itself without having to force a collection.

Hope this clarifies events best practice, with good evidence.

More information:
There are some handy new classes in .Net 4 that allow weak events. See http://msdn.microsoft.com/en-us/library/aa970850.aspx.  This is definitely my recommendation for use with WPF and an Event Aggregator, Attached Property, or Decorator, since these are normally implemented as statics or singletons. Ie, short lived listener subscribing to long lived singleton equals tight-rope-walk.

Sunday, August 8, 2010

IDisposable, what is it, when to use it, and how to use it

External References
Common Misconceptions
  • It means objects are Garbage Collected more quickly. (Incorrect).
  • Its a good way to make sure all your references are set to null. (Not necessary).
  • Its good practice to implement all the time. (Incorrect).
  • The Garbage Collector calls IDisposable.Dispose() automatically. (Incorrect).
  • Its a good idea when you are creating lots of objects in a short period of time.  (There is a better way).

What is it?
The IDisposable interface seems to be a commonly misunderstood and misused interface.  Some developers like to implement it as a matter of course thinking it is good practice.  But it is absolutely not necessary when dealing with managed objects (native .Net objects).  In .Net the Garbage Collector (GC) is more than capable of finding and disposing all objects quickly and efficiently.  

To quote Andrew Troelsen: "Allocate an object onto the heap using the new keyword and forget about it".

The GC stores lists of objects in generations "young" objects have a lower generation number than longer lived objects.  The lists of young generation objects are examined more frequently than higher generation lists of objects.  Commonly there are many more short lived objects than long lived ones.  The IDisposable interface is more than an interface, you could argue its more of a design pattern. To use it correctly requires more than just merely implementing the interface and satisfying the compiler.

When to use it?
The interface is only necessary to properly clean up resources and release memory when a class references unmanaged code.  Commonly this boils down to:
  • When a class references or calls a COM object.
  • When a class references or calls a C++ object, (or Win32).
  • When a class has an OS File Handle to a file resource that needs to be closed when the object is destroyed.
  • You inherit from a class that implements IDisposable.
  • You have a long lived field reference to an IDisposable class.
Just to reiterate, you do not need to implement it when only referencing .Net native types (managed types) or types that you have created yourself that are based on other .Net types etc.  You also do not need to implement it if one of the classes you reference has a reference to unmanaged code, only worry about the code in the current class (OO design principle "encapsulation").  

You also do not need to set objects to null; you may have once upon a time in COM and preceding technologies, but not in .Net.  By doing so you are merely adding an unnecessary assignment to your code which if used everywhere will ultimately slow down your application.  

So, how to use it properly?
REMEMBER: If you do choose to implement IDisposable you are declaring to any consumers of your class that its life cycle needs to be carefully managed. 99% of the time it should be used within a using block.  It should never be returned as a result of a function.  If you consume a class that implements IDisposable, then ideally you need enclose it in a using block and not keep a field reference.  If you need to keep a field reference, then your class should also implement IDisposable. In addition if you inherit from a class that implements IDisposable then your new class also needs to consider reimplementing IDisposable.

Best shown with an example. Here's an example of how to implement the interface correctly.  

// Design pattern for the base class. 
// By implementing IDisposable, you are announcing that instances 
// of this type allocate scarce resources. 
public class BaseResource: IDisposable { 
    // Pointer to an external unmanaged resource. 
    private IntPtr handle; 
    // Other managed resource this class uses. 
    private Component Components; 
    // Track whether Dispose has been called. 
    private bool disposed = false; 
    // Constructor for the BaseResource object. 
    public BaseResource() { 
        // Insert appropriate constructor code here. 
    /// <summary>
    /// Implement IDisposable. 
    /// Do not make this method virtual. 
    /// A derived class should not be able to override this method
    /// </summary> 
    public void Dispose() { 
        // Take yourself off the Finalization queue 
        // to prevent finalization code for this object 
        // from executing a second time. 
    /// <summary>
    /// Dispose(bool disposing) executes in two distinct scenarios. 
    /// If disposing equals true, the method has been called directly 
    /// or indirectly by a user's code. Managed and unmanaged resources 
    /// can be disposed. 
    /// If disposing equals false, the method has been called by the 
    /// runtime from inside the finalizer and you should not reference 
    /// other objects. Only unmanaged resources can be disposed.
    /// </summary>
    /// <param name="disposing"><c>true</c> to release both managed and unmanaged
    /// resources; <c>false</c> to release only unmanaged resources.</param>
    protected virtual void Dispose(bool disposing) { 
        // Check to see if Dispose has already been called. 
        if(!this.disposed) { 
            // If disposing equals true, dispose all managed 
            // and unmanaged resources. 
            if(disposing) { 
                // Dispose managed resources. 
            // Release unmanaged resources. If disposing is false, 
            // only the following code is executed. 
            handle = IntPtr.Zero; 
            // Note that this is not thread safe. 
            // Another thread could start disposing the object 
            // after the managed resources are disposed, 
            // but before the disposed flag is set to true. 
            // If thread safety is necessary, it must be 
            // implemented by the client. 
        disposed = true; 
    /// <summary>
    /// Finalizes an instance of the <see cref="BaseResource"/> class.
    /// Use C# destructor syntax for finalization code. 
    /// This destructor will run only if the Dispose method 
    /// does not get called. 
    /// It gives your base class the opportunity to finalize. 
    /// Do not provide destructors in types derived from this class.
    /// </summary> 
    ~BaseResource() { 
        // Do not re-create Dispose clean-up code here. 
        // Calling Dispose(false) is optimal in terms of 
        // readability and maintainability. 
    // Allow your Dispose method to be called multiple times, 
    // but throw an exception if the object has been disposed. 
    // Whenever you do something with this class, 
    // check to see if it has been disposed. 
    public void DoSomething() { 
        if(this.disposed) { 
            throw new ObjectDisposedException(); 

// Design pattern for a derived class. 
// Note that this derived class inherently implements the 
// IDisposable interface because it is implemented in the base class. 
public class MyResourceWrapper: BaseResource { 
    // A managed resource that you add in this derived class. 
    private ManagedResource addedManaged; 
    // A native unmanaged resource that you add in this derived class. 
    private NativeResource addedNative; 
    private bool disposed = false; 
    // Constructor for this object. 
    public MyResourceWrapper() { 
        // Insert appropriate constructor code here. 
    protected override void Dispose(bool disposing) { 
        if(!this.disposed) { 
            try { 
                if(disposing) { 
                    // Release the managed resources you added in 
                    // this derived class here. 
                // Release the native unmanaged resources you added 
                // in this derived class here. 
                this.disposed = true; 
            } finally { 
                // Call Dispose on your base class. 

// This derived class does not have a Finalize method 
// or a Dispose method without parameters because it inherits 
// them from the base class.

Its also important to note that there are a few best practice guidelines for IDisposable (from the Framework Design Guidelines book).
When a class implements IDisposable, it is declaring to the outside world that you must use it with care.  You must use it and it is your responsibility to call Dispose, only you will call Dispose it doesn't auto-magically get called.  Generally most experienced senior developers I know will agree you should always use a IDisposable object wrapped in a Using block.

Secondly you avoid returning an IDisposable object as the result of a function, the consumer more than likely will not check to see if it needs to be disposed; leading to IDisposable.Dispose() not being called. Hello memory leak.  This follows the first rule above, you cannot wrap an IDisposable object in a Using block and return it from a function.

What if you are creating lots of objects in a short period of time and only need them very briefly?
If doing nothing means there will be a lot of memory locked for a time until the GC collects them, then simply call GC.Collect() every so often periodically to kick the collector into checking for resources to free immediately.  Alternatively you can use the GC.AddMemoryPressure() and GC.RemoveMemoryPressure() functions to tell the GC there is a higher demand on memory than is usual and it should consider collection more frequently. These two options are  better than implementing IDisposable, because its less work, and by implementing IDisposable your code has more work to do because you are running another method on each of the large number of objects you have created and will more than likely be slower.

Be careful with these methods however, as with anything like this you should only use these if you have already proved with testing that it is better to use them than to not.

As a side issue, when calling GC.Collect() you should also call GC.WaitForPendingFinalizers().
When you manually force a collection, you should call WaitForPendingFinalizers() to cause the current thread to wait for all finalizers to finish running. This prevents you from calling any code on an object while it is being destroyed. Once this method returns (synchronously) you can check for null reference as per normal and then safely invoke a method if the pointer is non-null.

So how do you stop IDisposable from spreading through your code like a cancer?
There's a good topic on Stack Overflow on this very topic here. The short answer is its not easy. The long and short of it is:

  • Only implement IDisposable if you are using unmanaged resources or keeping a reference to an IDIsposable and your class is responsible for creating it. (The class creating it should be the class responsible for destroying it).
  • Prefer to consume IDisposable classes inside a using block.
  • Avoid returning an IDisposable class as a return object.
  • Keep expensive resource management in one place.
  • Prefer a create-on-demand pattern over singletons or long lived IDisposables.


Wednesday, August 4, 2010

Self Hosting in WCF

Self Hosting in WCF can pose some small issues when you would like to host many services in one service.  Here's some template code I have used before to solve this:

Code Download

Why host in a Console?  Usually the intention is to host inside another application or a Windows service, and testing it inside a console a simplier than the real application.  Why not IIS/WAS? If  you need to run long lived background threads hosting in IIS or WAS will have issues.

The main idea is to safely host each service inside a using block and use a controlling thread to signal hosting threads to trigger shut-down.

First I put all metadata for a service into an array of container classes. Then I can loop through and intitialise them, then wait for them all to come online.

           var services = new[]
                                   new SingleServiceHostContainer(typeof(TestService1), testMode) { Fake = typeof(TestService1Fake) }, 
                                   new SingleServiceHostContainer(typeof(TestService2), testMode) { Fake = typeof(TestService2Fake) }

            // Have used Threads instead of the task factory to ensure they start
            var hostThreads = new List<Thread>();
                s =>
                        s.ReadyToExitResetEvent = ReadyToExitResetEvent;
                        var thread = new Thread(s.Run);

            // Wait for initialisation to complete
            WaitHandle.WaitAll(services.Select(s => s.InitializationComplete).ToArray());
            Array.ForEach(services, s => s.WriteOutputToConsole());

            Console.ForegroundColor = ConsoleColor.Gray;
            Console.WriteLine("Press any key to quit");

            hostThreads.ForEach(t => t.Join());

Once they are all signalling they have completed their initialisation, then the main thread can simply wait until the user is ready to close the console and thereby shut down all the services.

There's a little noise in the code download, for things like dependency injection and WCF Data Services (OData services).

Tuesday, August 3, 2010

The .NET Garbage Collector

Exert from Pro C# by Andrew Troelsen
When you are building your C# applications you are correct to assume that the managed heap will take care of itself without your direct intervention. In fact, the golden rule of .NET memory management is very simple:

RULE: Allocate an object onto the managed heap using the new keyword and forget about it.

Once "new-ed", the garbage collector (GC) will destroy the object when it is no longer needed. The next obvious question is, of course, "How does the GC know when an object is no longer needed"? An excellent question.  The short answer is that the GC removes an object from the heap when it is unreachable by any part of your code.  

When an object goes out of scope it becomes a "candidate" for garbage collection. Understand however that you cannot guarantee that this object will be reclaimed from memory immediately after an object goes out of scope and is no longer unreachable. All you can assume at this point is that when the CLR performs the next garbage collection the object will then be safely destroyed.

As you will most certainly discover, programming in a garbage collected environment will greatly simplify your application development.  In stark contrast C++ programmers will be painfully aware that if they fail to manually delete heap-allocated objects, memory leaks are not far behind.  In fact, tracking down memory leaks is one of the most time consuming (and tedious) aspects aspects of programming with unmanaged languages.  By allowing the GC to be in charge of destroying objects, the burden of memory management has been taken from you shoulders and placed onto the CLR.  Effectively making us developers much more productive writing more business logic for less time.  In fact you may never need to use a memory profiler tool to track down memory leaks ever again!

NOTE: If you have any background in COM development, do know that .NET objects do not maintain an internal reference counter, and therefore managed objects do not expose methods such as AddRef() and Release().

If the managed heap does not contain sufficient space to host a requested new object, then a garbage collection run will occur immediately.

When garbage collection takes place the runtime will temporarily suspend all active threads within the current process.  The GC process has received considerable attention over the years and is highly optimised, and you will seldom (if ever) notice this brief interruption in your application.

The GC maintain two distinct heaps. One for very large objects and one for all others.  The heap for very large objects is less frequently consulted for collection, a good reason to follow good Object Oriented design principles and avoid very large objects.

So will objects that have unmanaged resources be GC'ed? Yes, but you need to be sure to release and clean up an unmanaged resources, because if you don't the objects will be destroyed by the GC, but your unmanaged resources may not be cleaned up leaving things in an inconsistent state.  To clean up unmanaged resources use a finalizer (destructor). 
RULE: The only reason to override Finalize() is if your C# class is making use of unmanaged resources via PInvoke or complex COM interoperability tasks (typically via the System.Runtime.InterOpServices.Marshal type).
Remember however, you cannot predict when an object is going to be destroyed by the GC and Finalize() called.  So if you have expensive resources that you want to free up sooner consider implementing IDisposable.

Monday, August 2, 2010

Wpf Custom Fonts


Include the *.ttf file into your WPF project as a Resource.

Reference it into a ResourceDictionary as follows:
    <FontFamily x:Key="WeirdFontFamily">/TestProject;Component/Fonts/#Weird</FontFamily>
    <FontFamily x:Key="WeirdBoldFontFamily">/TestProject;Component/Fonts/#Weird Bold</FontFamily>
I don't fully understand exactly what the # is doing, but suffice to say in this case it is followed by the font name not the file name.  The font name can be found by double clicking the TTF file which opens it in the WIndows Font Viewer.

Here's a usage example:

          FontFamily="{DynamicResource WeirdFontFamily}"