Spread the Word

I was over at Bil Simser’s Blog the other day reading his post on extending SharePoint through Extension Methods where he describes adding a “ListExists()” extension onto the SPWeb object.  The goal of this method is to return a boolean value representing the existence of the list name passed to the extension.

Extension Methods = cool, dangerous and sometimes obfuscated, but cool non-the-less.

Using Extension Methods to enhance a developer’s experience when working with the SharePoint Object Model?  +3.  Anything helps there right?

What really got me thinking was his method for determining the existence of the list and a discussion that ensued in the comments of his blog.  Just like everywhere else in the programming universe there are about 10 different ways to do things:

 

- Two or Three ways will #FAIL hard.

- Three will get you there at the cost of either performance or break every single pattern you swore you’d adhere too when you started the job.

- Three will work but nobody but you can understand it.

..And One or Two will be ideal and readable.

In his blog Bil talks about a common SharePoint practice of attempting to open an item in order to see if that object exists.  If it doesn’t exist the SharePoint Object Model will toss you up an exception, which you can catch. 

-3 to Microsoft for forcing developers to have to do this in the first place.  Let’s hope they’ve learned their lessons and address some of these issues in SP2010, or Office 14.  Whatever they’re calling it this week.

Bil goes on to state:

“However I'm still not happy with the try/catch statement in our extension method. Sure it works but it's expensive always throwing an exception when a list doesn't exist. Inside our extension method, we have access to all public methods and properties of the instance of the object we're extending so why not iterate through the list, checking to see if the name matches up and returning our boolean value instead?”

I immediately nodded and moved on, sounds reasonable right?

What really piqued my interest was a comment by ‘Fire Snake’ saying that he would bet his “lower back” that the Bil’s method of iterating through the collection was far slower than catching the exception even when working within a site that had a relatively low number of lists.

So is this true?  You know I’ve never actually tested it.  What is the best way to check for the existence or open a list using the SharePoint Object Model?

Well lets find out shall we?

The Problem:  Determining the Best Way to access a list using the SharePoint Object Model

Let’s start with the basics and look at the different ways Bil discussed in his Blog.

1. Iterate though and do a “==” string equality test on the names returning true if the list is found and false if you reach the end of the iteration without finding a match.

2. Just try a and open the item in the collection by name and let the Object Model bubble you up an exception which you catch.  If you don’t get an exception then the result is positive.

3. Bil’s method of Iterating through the collection and performing a SPList.Title.Equals(listName) comparison.  This looks cleaner in code, which is a bonus, but does it perform any differently?

In addition to his three methods I’ll throw in one just because I like to monkey with the process:

4. Directly accessing the list via the SPWeb.GetList() method.

 

Somewhere in the depths of my hard drive I had a little test stub that I pull out from time to time when I want to do load and performance testing during a build.  I pulled this up modified it to run some performance tests on the four methods described above.

In order to test these methods fairly I’m doing some pre-test setup tasks.  These tasks happen before EACH test iteration so we can make sure that the SharePoint Site Collection we’re working with is in exactly the same same state prior to running the tests.  We want to avoid any anomalies due to the SQL server Caching data, etc.

In addition to these setup tasks I’m running each method 100 times and averaging the results.  This should give us a better glimpse into the performance costs of each.

 

The Setup Tasks:

To setup each test we’ll destroy and re-create the site collection.  This gives us not only a fresh and clean site but creates all the data on the SQL Sever anew so we can roughly assume that the state of the SQL server cache should be about the same when the test begins.

I’m not going to dwell on this portion of the code as the tests are the actual goal here.

Here’s the code that tears down and re-creates the Site Collection:

   1: private static void TearDownSite(string appAddr,string siteAddr)
   2:  {
   3:      var  application = SPWebApplication.Lookup(new Uri(appAddr));
   4:      try
   5:      {
   6:          application.Sites.Delete(siteAddr);
   7:      }
   8:      catch (Exception){}
   9:  }
  10:  private static  Guid ProvisionSite(string appAddr, string siteAddr)
  11:  {
  12:      var application = SPWebApplication.Lookup(new Uri(appAddr));
  13:      var site = application.Sites.Add(siteAddr, "Test Title", "Test Description", 
  14:                                      1033, "STS#0", "********", "scott",
  15:                                      "********");
  16:      return site != null ? site.ID : Guid.Empty;
  17:  }

After We tear down the site we’re going to provision a new Site Collection and number of Lists and ListItems for those lists.  I want to try several different data load combinations during the testing so I’ve moved this code out to methods that will provision the number of lists and list items that I specify and return a strongly typed list of the data we created so we can access it later:

   1: private static List<ListStruct> ProvisionLists(Guid siteID,int iterations, int itemcount)
   2:         {
   3:             var lists = new List<ListStruct>();
   4:             const SPListTemplateType listTemplate = SPListTemplateType.Announcements;
   5:             using (var site=new SPSite(siteID))
   6:             {
   7:                 using(var web=site.RootWeb)
   8:                 {
   9:                     for (var i = 0; i < iterations; i++)
  10:                     {
  11:                         var listName = Utils.GetRandomNumber(25, Utils.ALPHA);
  12:                         var thisID=web.Lists.Add(listName,"Description",listTemplate);
  13:                         lists.Add( new ListStruct {ListID=thisID,ListName=listName});
  14:                         var list = web.Lists[thisID];
  15:                         for (var x=0; x < itemcount;x++)
  16:                         {
  17:                             var item = list.Items.Add();
  18:                             item["Title"] = Utils.GetRandomNumber(100, Utils.ALPHA_NUMERIC);
  19:                             item["Body"] = Utils.GetRandomNumber(255, Utils.ALPHA_NUMERIC);
  20:                             item.Update();
  21:                         }
  22:                     }
  23:                 }
  24:             }
  25:             return lists;

In the ProvisionLists method we create a number of lists with random character names.  In this case I choose the Announcement list template for its simplicity.  While more complex list templates have more Metadata and would require more processing time I figured if we only used this one template type for all the tests we could get some reasonably accurate results.

In this method we also create a number of list items with random data.  I decided to do this at the last minute just to make sure we were truly representing a production server that not only has lists but <GAWSH> has actual data in it!

How The Tests Work

We’ll run a two sets of test for each of the four methods we discussed earlier.  In the first set of tests we’ll lookup a KNOWN list.  That’s a list that we know we created.    The second test will look for a list that know DOESN’T exist.  This will show us the performance hit for each method when we end up iterating through the complete collection of lists just to realize that the list we’re looking for doesn’t exist.

In order to run the test I’ve created a delegate and a method that executes the delegate any number of times while timing the results:

   1: private delegate bool TestToExecute();
   2: public static Stopwatch Timer;
   1: private static double ExecuteAndAverage(TestToExecute test, int iterations)
   2:      {
   3:          var execTimes = new List<double>();
   4:          for (var i = 0; i < iterations;i++ )
   5:          {
   6:              Timer.Reset();
   7:              Timer.Start();
   8:              test();
   9:              Timer.Stop();
  10:              execTimes.Add(Convert.ToDouble(Timer.ElapsedMilliseconds));
  11:          }
  12:          double total = 0;
  13:          foreach(var value in execTimes) total += value;
  14:          return total/execTimes.Count;
  15:      }

The ExecuteAndAverage method accepts a TestToExecute delegate and the number of iterations to run.  It simply initializes a System.Diagnostics.Stopwatch timer, executes the test, and returns the average time to run the tests in milliseconds.

 

Test #1 – Full Iteration using the “==” operator on the List’s name

Here’s the full method that runs the test.  for all of the remaining tests we’ll just look at the actual part of the code that runs the test. i.e. the anonymous delegate we pass:

   1: private static void IterationTestSuccess(int listCount,int itemcount, int runCount)
   2:        {
   3:            Guid siteID;
   4:            var lists = SetupTestSite(AppAddr, SiteAddr, listCount, itemcount, out siteID);
   5:            if (siteID == Guid.Empty) return;
   6:            Console.WriteLine("[GO!]");
   7:            var avgTestTime = ExecuteAndAverage(
   8:                delegate
   9:                    {
  10:                        var rnd = new Random();
  11:                        var list = lists[rnd.Next(listCount - 1)];
  12:                        using (var site = new SPSite(siteID))
  13:                        {
  14:                            using (var web = site.RootWeb)
  15:                            {
  16:                                for (var i = 0; i < web.Lists.Count; i++)
  17:                                {
  18:                                    if (web.Lists[i].Title == list.ListName)
  19:                                        return true;
  20:                                }
  21:                                return false;
  22:                            }
  23:                        }
  24:                    }, runCount);
  25:            Console.WriteLine("IterationTestSuccess [x{0}] Lists:{1} ItemsPer:{2} Avg Exec Time ms:{3}",
  26:                               runCount,listCount,itemcount,avgTestTime);
  27:        }

You can see that we setup our fresh site and execute our tests reporting the results out to the console. 

Let’s give it a whirl shall we?

   1: IterationTestSuccess [x100] Lists:10   ItemsPer:10   Avg Exec Time ms:8.8
   2: IterationTestSuccess [x100] Lists:10   ItemsPer:100  Avg Exec Time ms:8.56
   3: IterationTestSuccess [x100] Lists:10   ItemsPer:1000 Avg Exec Time ms:8.39
   4: IterationTestSuccess [x100] Lists:100  ItemsPer:10   Avg Exec Time ms:17.77
   5: IterationTestSuccess [x100] Lists:100  ItemsPer:100  Avg Exec Time ms:17.54
   6: IterationTestSuccess [x100] Lists:1000 ItemsPer:10   Avg Exec Time ms:159.97

We’ve run the test with 6 different set of characteristics.  Each test ran 100 times and returned the average time it took to execute each test within that test set.

The results look pretty standard. Since we’re iterating through a collection it’s easy to deduct that the more lists we create in the site the longer the operation is going to take.   But remember this test operates by randomly selecting one of the lists we created during the setup.  What happens when we run the following test:

 

   1: var avgTestTime = ExecuteAndAverage(
   2:                 delegate
   3:                     {
   4:                         using (var site = new SPSite(siteID))
   5:                         {
   6:                             using (var web = site.RootWeb)
   7:                             {
   8:                                 for (var i = 0; i < web.Lists.Count; i++)
   9:                                 {
  10:                                     if (web.Lists[i].Title == "Please Dont Find Me!")
  11:                                         return true;
  12:                                 }
  13:                                 return false;
  14:                             }
  15:                         }
  16:                     }, runcount);

This test looks for a list named “Please Dont Find Me!” which we know doesn’t exist.  That means that the operation is forced to iterate through the complete collection. 

Curious?  Think it will make a difference? 

   1: IterationTestFailure [x100] Lists:10   ItemsPer:10   Avg Exec Time ms:8.26
   2: IterationTestFailure [x100] Lists:10   ItemsPer:100  Avg Exec Time ms:8.35
   3: IterationTestFailure [x100] Lists:10   ItemsPer:1000 Avg Exec Time ms:8.29
   4: IterationTestFailure [x100] Lists:100  ItemsPer:10   Avg Exec Time ms:17.56
   5: IterationTestFailure [x100] Lists:100  ItemsPer:100  Avg Exec Time ms:17.66
   6: IterationTestFailure [x100] Lists:1000 ItemsPer:10   Avg Exec Time ms:168.46

I didn’t think so.  Well with the exception of the site that contained 1000 lists containing 10 items each there isn’t that much of a difference – and even then it’s pretty minor considering.

Why is that you ask?  It’s because as soon as you call SPWeb.Lists, regardless of how you work with the data, the object model is loading the entire list collection.  While I haven’t taken the time to verify this yet I’m pretty sure the performance hit is simply the collection populating from the database.  Once you get the collection into memory doing anything but iterating through a couple of million items is trivial at that point.

Let’s digest that for a bit and move on to the next Test:

 

Test #2 – Attempting to directly open the list via it’s name and catching an exception if it fails.

Right to it this time.  Here’s the test that we know will succeed:

 

   1: ar avgTestTime = ExecuteAndAverage(
   2:                 delegate
   3:                     {
   4:                         var rnd = new Random();
   5:                         var list = lists[rnd.Next(listcount - 1)];
   6:                         using (var site = new SPSite(siteID))
   7:                         {
   8:                             using (var web = site.RootWeb)
   9:                             {
  10:                                 try
  11:                                 {
  12:                                     var foundlist = web.Lists[list.ListName];
  13:                                     return foundlist != null;
  14:                                 }
  15:                                 catch{return false;}
  16:                             }
  17:                         }
  18:                     }, runcount);

and here’s the test we know will fail and throw up an exception for us to catch:

   1: var avgTestTime = ExecuteAndAverage(
   2:              delegate
   3:                  {
   4:                      using (var site = new SPSite(siteID))
   5:                      {
   6:                          using (var web = site.RootWeb)
   7:                          {
   8:                              try
   9:                              {
  10:                                  var foundlist = web.Lists["Please Don't Find Me"];
  11:                                  return foundlist != null;
  12:                              }
  13:                              catch { return false; }
  14:                          }
  15:                      }
  16:                  }, runcount);

 

Think we’ll see a performance difference here over the previous test? Let’s see:

   1: ExceptionSuccess [x100] Lists:10   ItemsPer:10   Avg Exec Time ms:8.22
   2: ExceptionSuccess [x100] Lists:10   ItemsPer:100  Avg Exec Time ms:8.25
   3: ExceptionSuccess [x100] Lists:10   ItemsPer:1000 Avg Exec Time ms:8.38
   4: ExceptionSuccess [x100] Lists:100  ItemsPer:10   Avg Exec Time ms:17.37
   5: ExceptionSuccess [x100] Lists:100  ItemsPer:100  Avg Exec Time ms:17.79
   6: ExceptionSuccess [x100] Lists:1000 ItemsPer:10   Avg Exec Time ms:177.76
   7:  
   8: ExceptionFailure [x100] Lists:10   ItemsPer:10   Avg Exec Time ms:13.98
   9: ExceptionFailure [x100] Lists:10   ItemsPer:100  Avg Exec Time ms:13.47
  10: ExceptionFailure [x100] Lists:10   ItemsPer:1000 Avg Exec Time ms:13.59
  11: ExceptionFailure [x100] Lists:100  ItemsPer:10   Avg Exec Time ms:23.14
  12: ExceptionFailure [x100] Lists:100  ItemsPer:100  Avg Exec Time ms:22.42
  13: ExceptionFailure [x100] Lists:1000 ItemsPer:10   Avg Exec Time ms:169.93

Interesting.  the time to directly access a known list via it’s name is roughly the same as the previous test using the iteration. 

The big surprise here is that the test designed to fail showed a significant performance decrease.

Why?  Well as we learned earlier we take an initial performance hit as soon as we touch the SPWeb.Lists collection -- then we are relying on the object model to instantiate an exception object and bubble it up to us. 

This was more apparent in the sites with relatively few lists.  While I realize there’s not a drastic difference between 8 and 13 milliseconds you also have to consider the additional memory utilized by the exception object and all the metadata baggage it carries along with it.  

If you’ve got 10,000 users on your SharePoint Web Front Ends using a Web Part that you’ve created using this method you’ll definitely realize this performance hit.

 

Ok, on to test #3:

Test #3 – Using the Equals method of the title string

I threw this in specifically because Bil mentions it in his blog – so we’ll run the tests just to see if there’s any difference doing it this way.  My guess is that it just makes the code easier to read. 

The two tests:

   1: delegate
   2:      {
   3:          var rnd = new Random();
   4:          var list = lists[rnd.Next(listcount - 1)];
   5:          using (var site = new SPSite(siteID))
   6:          {
   7:              using (var web = site.RootWeb)
   8:              {
   9:                  try
  10:                  {
  11:                      foreach (SPList ilist in web.Lists)
  12:                          if (ilist.Title.Equals(list.ListName))
  13:                              return true;
  14:                      return false;
  15:                  }
  16:                  catch { return false; }
  17:              }
  18:          }
  19:      }, runcount);

The test designed to fail:

   1: var avgTestTime = ExecuteAndAverage(
   2:            delegate
   3:                {
   4:                    using (var site = new SPSite(siteID))
   5:                    {
   6:                        using (var web = site.RootWeb)
   7:                        {
   8:                            try
   9:                            {
  10:                                foreach (SPList ilist in web.Lists)
  11:                                    if (ilist.Title.Equals("Please Dont Find Me"))
  12:                                        return true;
  13:                                return false;
  14:                            }
  15:                            catch { return false; }
  16:                        }
  17:                    }
  18:                }, runcount);

and the results:

   1: ByCompareTestSuccess [x100] Lists:10   ItemsPer:10   Avg Exec Time ms:7.96
   2: ByCompareTestSuccess [x100] Lists:10   ItemsPer:100  Avg Exec Time ms:8.08
   3: ByCompareTestSuccess [x100] Lists:10   ItemsPer:1000 Avg Exec Time ms:8.41
   4: ByCompareTestSuccess [x100] Lists:100  ItemsPer:10   Avg Exec Time ms:17.35
   5: ByCompareTestSuccess [x100] Lists:100  ItemsPer:100  Avg Exec Time ms:17.49
   6: ByCompareTestSuccess [x100] Lists:1000 ItemsPer:10   Avg Exec Time ms:164.7
   7:  
   8: ByCompareTestFailure [x100] Lists:10 ItemsPer:10 Avg Exec Time ms:9.12
   9: ByCompareTestFailure [x100] Lists:10 ItemsPer:100 Avg Exec Time ms:8.13
  10: ByCompareTestFailure [x100] Lists:10 ItemsPer:1000 Avg Exec Time ms:8.36
  11: ByCompareTestFailure [x100] Lists:100 ItemsPer:10 Avg Exec Time ms:17.13
  12: ByCompareTestFailure [x100] Lists:100 ItemsPer:100 Avg Exec Time ms:16.75
  13: ByCompareTestFailure [x100] Lists:1000 ItemsPer:10 Avg Exec Time ms:168.82

Yup, just as we guessed. Just about on par with Test #1.  But +1 for readability right?

Test #4 Using the SPWeb.Lists.GetList Method

This method is a bit different in that it requires you to know the GUID of the list.  But what makes it special is that it does NOT initialize the collection.  I know that Bil was specifically discussing looking up a list by it’s name but I have to throw this in here to demonstrate how poorly the SharePoint Object Model deals with data.

The Success Test:

   1: var avgTestTime = ExecuteAndAverage(
   2:              delegate
   3:                  {
   4:                      var rnd = new Random();
   5:                      var list = lists[rnd.Next(listcount - 1)];
   6:                      using (var site = new SPSite(siteID))
   7:                      {
   8:                          using (var web = site.RootWeb)
   9:                          {
  10:                              try
  11:                              {
  12:                                  var foundlist = web.Lists.GetList(list.ListID,false);
  13:                                  return foundlist != null;
  14:                              }
  15:                              catch { return false; }
  16:                          }
  17:                      }
  18:                  }, runcount);

And the Failure Test:

   1: var avgTestTime = ExecuteAndAverage(
   2:     delegate
   3:         {
   4:             using (var site = new SPSite(siteID))
   5:             {
   6:                 using (var web = site.RootWeb)
   7:                 {
   8:                     try
   9:                     {
  10:                         //Should Fail!
  11:                         var foundlist = web.Lists.GetList(Guid.NewGuid(), false);
  12:                         return foundlist != null;
  13:                     }
  14:                     catch { return false; }
  15:                 }
  16:             }
  17:         }, runcount);

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Curious?  I Bet.

 

   1: ByIDTestSuccess [x100] Lists:10   ItemsPer:10   Avg Exec Time ms:4.73
   2: ByIDTestSuccess [x100] Lists:10   ItemsPer:100  Avg Exec Time ms:4.69
   3: ByIDTestSuccess [x100] Lists:10   ItemsPer:1000 Avg Exec Time ms:4.44
   4: ByIDTestSuccess [x100] Lists:100  ItemsPer:10   Avg Exec Time ms:4.87
   5: ByIDTestSuccess [x100] Lists:100  ItemsPer:100  Avg Exec Time ms:4.65
   6: ByIDTestSuccess [x100] Lists:1000 ItemsPer:10   Avg Exec Time ms:4.34
   7:  
   8: ByIDTestFailure [x100] Lists:10   ItemsPer:10   Avg Exec Time ms:17.4
   9: ByIDTestFailure [x100] Lists:10   ItemsPer:100  Avg Exec Time ms:17.45
  10: ByIDTestFailure [x100] Lists:10   ItemsPer:1000 Avg Exec Time ms:17.38
  11: ByIDTestFailure [x100] Lists:100  ItemsPer:10   Avg Exec Time ms:17.52
  12: ByIDTestFailure [x100] Lists:100  ItemsPer:100  Avg Exec Time ms:17.25
  13: ByIDTestFailure [x100] Lists:1000 ItemsPer:10   Avg Exec Time ms:17.45

Mother of ALL things holy. Did you see that?

It’s true Dear Reader.  There is ZERO performance degradation on sites with 1000 lists!  And here you though I threw that 1000 list provisioning work in the other 3 tests just because I wanted my processor to overheat and my hard drive to make screeching noises.

This basically tells us something that we should already know or at least have deduced by developing in SharePoint – that Microsoft has the ID field (uniqueidentifier) set as the primary key in the database. 

I know most SharePoint developers think it taboo to even look into content databases but I’m here to tell you that for you to truly understand how SharePoint works under the covers you need to understand how it stores data.  That’s why we have Development VMs right?  So we can break things – Because only by looking under the hood can we begin to truly understand this behemoth.

I’m not saying you should program directly against the database, I’m just saying if you don’t know where to find lists and list items in the database you should spend some time learning the data structure.  It will pay dividends in the long run when you’re debugging why your code isn’t doing whatever it should be doing.

 

Using our new Knowledge

All in all my tests ran 48,000 times during the course of the day to ensure all the results were the about the same.  So we’ve done our fair share of measuring.  Now how do we ‘Cut’?

We now know that doing a simple string comparison while iterating through the collection does in fact perform better than catching an exception.  But is it the best way to hook up an SPList object?  Probably not.

What’s that?  I hear you complaining! Trust me it’s coming through my speakers.  You don’t know the ID of the list you’re looking for? 

Ok, let’s look at some (But not all by any means) ways of dealing with this.

#1 – You have a nifty do-something-special-that-CQWP-can’t-do WebPart that you want to drop on a page and read a list containing “Tasks'”.  Simple – Use the WebPart’s shared properties and and the WebPart Manager to present the user with a drop down list of available lists and store the GUID of the list as the property.  Then your actual day-to-day access only uses the ID.

 

#2 – Your Nifty Workflow Foundation code that fires off when a user adds a task is supposed to drop a new ListItem in some other tracking list.  Again – use the Associate form to pick the list and store the ID.

 

#3 – You’re dealing with some process that absolutely HAS to open a bunch of lists based on the name and there’s just no other avenue for you to take.   How about you use the application cache to store the list names and IDs?  That way if you have TONS of users executing this code in some WebPart you’ll only take that one initial hit while the cache is constructed or re-constructed after a timeout.  Subsequent calls to lookup the ID in the application cache wouldn’t need to access the data.

 

I’m sure there are actually lots of reasons out there that why accessing a list by it’s ID just isn’t feasible – But the point I’m trying to get across here is that just a little up-front work will pay off in spades when your code gets deployed to a production server that has 100,000 users gleefully consuming your content.

Especially when there’s that one dude at that client (every client has one or five) who insist on creating 1000 lists in his department’s Site Collection :)

 

P.S.  Specs of the machine used to run the test:

 

MOSS 64-bit SP2 Single Server Install running in the Host OS.

SQL 2005 Developer version (64-bit) running in the Host OS.

Host OS: Vista 64-bit.

Intel Q9300 (Quad Core 2.5Ghz)

8GB ram

 

Contact me via DM in Twitter, LinkedIn, or Facebook if you’d like the C# file.

© Copyright 2010 G. Scott Singleton | Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.