Back to Homepage FullerData.com - News


General News:

BBC News
Guardian News
Telegraph News
BBC UK
BBC Technology
BBC Business
World Press
Telegraph Opinion
Scotsman Opinion
Yahoo Opinion
BBC In Depth
BBC Magazine

Techie News:

Slashdot
Kuro5hin
Slashdot Developers
Slashdot Games
Slashdot Science
PhysicsWeb
Wired Technology
PCWorld
DevMaster
GamaSutra Articles
The Register
TheServerSide .NET
TheServerSide J2EE

Sports News:

Sport Headlines
Football
Motor Sport
Cricket
US Sport

Microsoft:

MSDN Architecture
MSDN Patterns
MSDN
MSDN Magazine
MSDN Web Services
MSDN C#
MSDN .Net Framework
MSDN ASP.Net
C# FAQ

Database:

MSDN SQL Server
Oracle ASk Tom - Popular
Oracle Ask Tom - Recent
Oracle Blogs

Other Techie:

Code Project
C# Stuff
Live @ Sax.net
Help .Net
SQL Junkies
DotNet Junkies
4GuysFromRolla.com
Netcraft

Blogs:

Chris Brumme
Martin Fowler
Chris Sells
Scott Watermasysk
Sam Gentile
Eric J. Smith
Herb Sutter
The Old New Thing
Sam Ruby
Tim Bray
Tom Miller (MDX)
Rico Mariani
causticTech
Johns Perf Blog


JohnsPerformance:

  • Why is my laptop so sluggish? Or Damn You Facebook and Twitter! Or All Hail Chrome!

  • In the past three weeks, I've noticed that my laptop (dual core 2.1GHz, 2Gb RAM) has become amazingly sluggish.  I only uses for communications and data lookup workflows, so the slowness was tolerable.  But today I finally got fed up with the suckyness and decided to get to the root of the problem (I do have strong performance roots after all).

    It actually didn't take all that long to figure it out.  About a year ago I converted to Google Chrome (away from FireFox).  One of the great tools Chrome has is a "Task Manager" tool, that gives you Windows Task Manager like details for all the tabs open in the browser (Shift + Esc).  Since every tab runs in its own process, its easy from Task Manager (both Windows or Chrome) to identify and kill a single performance offending tab.  This is unlike IE, where you only get aggregate data about all tabs open. 

    Anyway, I digress.  Today my laptop sucked.  Windows Task Manager told me that I had two memory hogging Chrome tabs, but couldn't tell me which web page those tabs are showing.  Enter Chrome Task Manager which tells you the page title, along with CPU, memory and network utilization of each tab. 

    Enter my amazement.  Turns out Facebook was using just shy of half a Gb of RAM.  Half a Gigabyte!  That's 512 Megabytes!524,288 Kilobytes! 536,870,912 Bytes!  Or 4,294,967,296 Bits!  In other words, that's a frackin boat load of memory. 

    Now consider that Facebook is running on pretty much 96.3% (statistics based on absolutely nothing) of every house hold desktop, laptop, netbook, and mobile device in America, that is pretty horrific!

    And I wasn't playing any Facebook games like FarmWars or MafiaVille.  I just had my normal, default home page up showing me who just had breakfast, or just got finished with their morning run.

    I'm sorry...let me say that again...HALF A GIG OF RAM!  That is just unforgivable.

    I can just see my mom calling me up: 
    Mom: "John...I think I need a new computer.  Mine is really slow these days"
    John: "What do you have running?"
    Mom: "Oh, just Facebook"
    John: "Ok, close Facebook and tell me how fast your computer feels"
    Mom: "Well...I don't know how fast it is.  All I do is use Facebook"
    John: "Ok Mom, I'll send you a new computer by Tuesday"

    Oh yea...and the other offending web page?  It was Twitter, using a quarter of a Gigabyte.

    God I love social networks!
     


  • Win7 is not a tablet OS, no matter what the boys in Redmond think.

  • Despite what execs at Microsoft think, Windows 7 is NOT a tablet OS.  Just because you can install some software (or OS) on a device, doesn't mean that device is meant to run that software.  This seems to be the step that the non-engineer execs at Microsoft have seem to not understood. 

    In order to seamlessly work with a device, the software needs to be designed with that device in mind.  That has been the problem with the Windows PDA platform, the Windows Mobil platform, and now with trying to force fit Windows 7 on a tablet.  Its just not designed for that style of interaction.  

    Windows is designed to be interacted with via a mouse and keyboard.  In fact, it is brilliant at that.  But, It is NOT designed to be interacted with by your fingers.  And that is why the Windows tablet failed 10 years ago, and why it will fail today.  Its not the hardware's fault like Microsoft claimed 10 years ago.  Its the User Interaction design that failed.

    And this is why the iPhone and Android OS's work wonderfully on a tablet.  The user interaction was designed for small screens, navigated by big fat fingers.  I love these OS's and how I interact with them.  And when I play with a touch screen Windows 7 device, I am feel like I'm playing with a brittle wana-be.  And its not the hardware's fault.  The touchscreen is very responsive.  I actually like the hardware.  But the OS and the software are just not designed to be interacted with, with my big fat fingers. 

    In order to be successful, Microsoft needs to start from scratch, and build a platform AND SOFTWARE specifically for use by fingers.  Thats why everyone was so excited when they though Microsoft was going to release the Courier tablet.  Because it looked like a totally different platform.  Something that might actually work.  But Windows 7...I hate to burst your bubble, but you are not a touch platform.


  • Create Pivot collections much faster than DeepZoomTools CollectionCreator class

  • I've been playing with Microsoft Live Labs Pivot to create a hierarchy of collections all linked together to allow someone to explore a hierarchy of data visually. The problem has been the generation time of the entire hierarchy. I end up creating 500 - 600 collections total and it takes hours and hours using the CollectionCreator class that comes with the DeepZoomTools. 

    So digging around I found a way to make the actual DeepZoom collection creation wicked fast. Dont use the CollectionCreator! 

    Turns out Pivot doesnt actually use the image pyramid generated by the CollectionCreator. Or if it does, its only when you open a new collection it shows all the images zooming in. But once the zoom in is complete, Pivot uses the individual DeepZoom images. What Pivot does need is the xml generated by the CollectionCreator, which is in a very simple format. 

    So what i did was manually generate the xml for the collection image pyramid, and then create the folder structure required (one folder per level of the pyramid), and put a single pixel png file in each folder. 

    Now, I can create the required files and folders for 500 collections in about 10 seconds. Sweet!

    Now you still have to use the ImageCreator to create a DeepZoom image for each image in the collection and that still takes some time, but at least the total processing time is way better.


  • Seeking questions about creating Microsoft Live Labs Pivot collections

  • I've spent the past 3 weeks working a lot with Pivot from Microsoft Live Labs (http://getpivot.com/).  Pivot is a tool that allows you to visually explore data. Its an interesting take on visual data mining.

    Anyway, I've been writing a lot of code that creates a hierarchy of Pivot collections, where one item in the collection drills down into an entirly new collection.

    The dev community around Pivot is still very young, so there isnt much tribal knowledge built up yet.  I've spent a lot of time trying to get things to work through trial and error, as well as digging around in Reflector.  But I've finally got a framework built for programatically creating DeepZoom images, Pivot collections, Sparse Images, etc.  

    If anyone has any questions, or suggestions on a post topic, leave a comment and I'll try and answer your question. 


  • Did Microsoft designers got their butts kicked 3 years ago?

  • This is something I've been wondering about for about a year now.  Microsoft has a history of creating very useful products, with lots of useful features.  But useful does not mean usable.  A lot of stuff coming out of Redmond the past 10 years don't really seem to have been well thought out from a user design point of view.  Lots of extra steps, lots of popup windows...very little innovative thinking going on about the user experience of these products.

    But about a year ago I started seeing changes in the new products coming out of Microsoft.  Windows 7 is a good example of a big change.  They really got their asses handed to them on Vista, so they had to make a change.  But it looks like this change in philosophy has bled over to other areas.  The new Office (2010) lineup has a lot of changes in it to make it way more usable. 

    Given that big changes like this take about 3 years to go from start to actually shipping product, I'm curious what happened internally at Microsoft that really drove this change in product design.  I think that Microsoft got so focused on just adding new functionality for so long, they forgot about the little things that can really make or break a product.  Office 2010 is full of these little things that make it much nicer to use.  I just hope its not too late for them.


  • Change in Job Title and Responsibilities

  • I've spent the past 7 years focused primarily on code and database performance.  It's an area that I have a passion for, as well as a propensity.  But what I've found is that its very hard to change the culture of a development environment.  You can teach performance, you can encourage performance, you might see slight shift in how devs think about performance.  But without full management backing and support you wont get long lasting changes in the development culture.  And in the end, you are back to being the "Perf Guy", fixing performance design flaws, after the fact, one by one by one.

    Which is why last year I asked my boss to changed my title and responsibilities to more naturally align with the team I was working for.  So now I'm a Computing Research Engineer (vague, I know), researching in the field of Big Data analytics and visualization.

    I've found this change revitalizing and a lot of fun.  And given the nature of Big Data (its, um…big) the performance aspects are always ever present.


  • MDbg: a managed wrapper around ICorDebug!

  • Recently a performance bug came my way.  A highly multithreaded application, that can run for hours depending on the amount of data its processing, was observed having all its CPUs ramping up to 100% utilization, and the amount of data processed per second dropped down to nothing.

    Ok, no big deal here.  I've most likely got a state where all threads are stuck in a tight loop (most likely the same loop), and each thread is waiting on the other to set a flag that will allow them to exit the loop.    Your basic deadlock issue.  Pretty easy to fix, if I can reproduce the problem on my dev machine and use the debugger to tell me what the offending function is.

    The problem is that I wasn’t able to reproduce it.  Crap.

    Ok…on to step two…

    Looks like I'll have to find or build a tool that can give me the call stack of all the threads in a managed app.  I started out trying to use System.Diagnostics.StackTrace, and StackFrame.  From a System.Thread instance I can get create a StackTrace object and see what function the thread is in.  But I cant get a list of all System.Thread objects in an app.  I have access to the Process.Threads collection, but that gives me a list of System.Diagnostics.ProcessThread objects, not System.Thread.  Shoot…that’s not going to work.

    Ok, next step is to look at creating a really light weight debugger, from .Net's ICorDebug api, to basically break into the app and dump out the call stack of all the managed threads.  I found a couple examples and it didn’t look too bad, but the only issue is that ICorDebug is a COM API.  So I'd have to do all that fun C++ COM stuff…Ick.  And I need the tool yesterday.

    After digging around a bit more I found out that the Visual Studio debugger team wrote a very nice managed wrapper around ICorDebug, called MDbg.  Sweet!

    There is a bunch of info about it here.

    After digging a bit further, I found that someone write a handy little tool called Managed Stack Explorer.  Oh geez!  The gods are smiling at me!  That’s exactly what I need.

    This little tool shows all managed apps running on your server.  When you pick an app, it shows all threads in the process.  When you click the thread, it shows you the call stack for that thread.  Simple and nice.

    With this tool, I was able to find the offending non-threadsafe function in about 5 minutes.  Fixed, done, yipee.

    But this post and about someone's tool, of my bug fixing adventures.  No, its about coming across one of the most useful APIs I've seen in a long time!  A simple and well designed .Net wrapper around ICorDebug, giving .Net developers full access to the CLR debugger.  I'm very excited about the idea of a managed wrapper around ICorDebug.  There are so many diagnostic tools that could be created with this.  I'm looking forward to digging around in the API!


  • Many core processors and parallel processing

  • Although most of topics I've written about are pretty random, I'll try to focus in on a much more narrow (yet incredibly broad) topic: multi core vs many core processing, parallel processing, and the paradigm shift that we software engineers are on the leading edge of having to face.

    To put it in short Intel, AMD, and other hardware manufacturers are telling anyone that listens that programmers need to change the way they think about designing enduser software.  End-user software needs to take advantage of multiple cores.  And this doesn't mean spinning up a background thread to do some compute intensive request, so that our UI remains responsive.  It means designing all compute intensive algorithms to scale to multiple processors.

    Intel goes on to say that designing for 2, 4, or 8 processors is way to short sighted.  We need to design our software to scale out to N processors; where N could be 16, 64, or 512.

    Coding Horror has a great post from last year that demonstrates how well common end user software take advantage of multi core processors.  The results as sad to say the least.

    We can no longer just expect our software to get faster with the next chip release by Intel or AMD.  What is worse, our software will most likely run slower on newer desktop and mobile chips.

    The trends in processor manufacturing is to have slower, cooler, more efficient individual cores, and to pack more and more of them on a single chip.  This means that end user software that only use 1 or 2 threads will actually run slower on newer processors.

    This can be seen with Intel's new quad core mobile processor: QX9300.  It has 4 cores, supporting hyper threading so it shows 8 cores in task manager, but runs at 2.53 GHz.  This is an amazing chip, but only for software that is actually designed to run across multiple cores.

    To boil it down to a simplified problem statement: Software outlives hardware, and hardware ain't getting any faster.  (more on that later)


  • Complexity and Usability

  • Generally I'm not one to write a post that does nothing but highlight someone else's blog post...BUT...this one was important enough (IMHO) that I decided to break my own rule. 

    Are you building a Leatherman or a Samurai sword?  (stupid linker isnt working)

    http://petewarden.typepad.com/searchbrowser/2008/07/are-you-buildin.html 

    As programmers we always want to write new functionality...neat, new, COOL functionality.  That's just what we do, and we love it. 

    But its hard to keep in mind what our added functionality does to user efficiency.  No matter what we think our job is all about, its really about making the lives of our users easier and more efficient.  That’s it…done…its that simple.

    This is easy to understand when writing a UI application.  If a new feature causes the use to perform 5 extra steps with the UI, but those 5 extra steps only give a small return on efficiency (so small it wasn’t worth the time to perform the 5 new steps), than drop the feature, its not worth it.  If the feature is complex or confusing, and will cause the user to misuse it or skip it all together, than drop the feature, its not worth it.

    Where this becomes harder to evaluate is in writing an SDK API.  Like the above post states, we all want to write the ultimate architecture.  The one that can do anything and everything.  But "anything and everything" can quickly become a directionless mess, where you have a several hundred of classes with obvious direction on how to weave them together into the next "Wonder Bread".  What you end up with is a big mess that your users (other developers) will mostly likely just pass off as too complex and look for a simpler API.

    The last part of the above post states it perfectly. 

    "You end up with a million features, which makes it very time-consuming to build, and even when it's done, the number of different gizmos on your Leatherman scare off potential users. You need to have a strong connection to your actual customers, and be hearing about exactly what they need to do. Then you need to design around that, ruthlessly jettisoning anything that distracts from them achieving their goals."


  • Creating an instance of a generic paremeter is slooooo: part deux

  • For grins I looked at my code that calls:

    T tmp = new T();

    in Reflector, so see if it could shed any light into T instance creation badness.  Well, it turns out that the C# compiler spits out code to call Activator.CreateInstance

    T tmp = Activator.CreateInstance<T>();

    I kind of get why the C# compiler does this, because it doesnt know what T is at compile time.  But at run time the JIT compiler DOES know.  I'm surprised that the C# team didn't build in the smarts to JIT code to explicitly call the default constructor of whatever type T is.


  • Creating an instance of a generic paremeter is slooooo

  • I recently needed to change how an array lookup worked to make it more efficient, and decided to use the List<T>.BinarySearch to do the lookup.  The class that contained this lookup had a generic parameter, and was constrained like so:

    public class SortedNameList<T> where T : class, INameValueItem, new()
    {...}

    where the T of List<T> was the same as the class generic parameter.

    In order to do the BinarySearch, List<T> required an input of type T to search against.  Since I only had the value of the property that will be compared against (an int), I needed to create a new temp instance of T, set the value, and then pass it into BinarySearch().

    My unit tests passed, all the functionality was good, and I was happy.  Then I ran the my app under a profiler to see how much faster my fancy BinarySearch was. 

    To my surprise, the time spent doing the binary search calls was almost exactly the same as a linear lookup (over 1.2 million searches)!  What the heck?  I know that creating a new temp object each lookup isn't very efficient, but it shouldn't make that much of a difference.

    So after looking a bit deeper and doing some more performance tests, I found out that creating a new instance of a generic ("T tmp = new T()") is sloooooo.  How slow?  How about 30X slower!  WOW...I had no idea!

    And its not that it takes the CLR some time to figure out how to create a new T, where most of the time is on the first instance, and the rest speed up.  Nope, the duration to create a new T is consistant, from the first instance to the millionth instance.

    Good to know...dont do that in a high volume area


  • No more null checking on your IEnumerables before you iterate over them

  • I get a bit sick of checking for null on my IEnumerable objects before doing a foreach over them.  In my opinion I think the CLR should check if the list is null, and if it is just exit out of the foreach iteration as if there were no items in it.

    Well, I was goofing around with Extension Methods a bit and figured out how to get this kind of functionality (sort of).

    Now unfortunatly Extension Methods cant override an existing method on a type, so I cant just create a new GetEnumerator extension method (well, actually i can make one, but it wont get called).  But I can create a new method that returns IEnumerable, and just call the foreach on it.

    So in order to do this, first add this class to your code

    public static class MyExtnesionMethods
    {
        public static IEnumerable<T> Enum<T>(this IEnumerable<T> input)
        {
            if (input != null)
            {
                foreach (var t in input)
                {
                    yield return t;
                }
            }
            else
            {
                yield break;
            }
        }
    }

    Now, anything that inherits from IEnumerable<T> will have the Enum method.  Then all you have to do is call foreach on someClass.Enum(), even if someClass is null.  Below is an example of ho this works.

    static void Main(string[] args)
    {
        List<string> names = new List<string>()
            {"john", "kim", "jean", "brent"};

        //iterate names using stock enumerator
        foreach (string name in names)               
            Console.WriteLine(name);
       
        //iterate names using extension method
        foreach (string name in names.Enum())
            Console.WriteLine(name);

        names = null;

        //oh man!  I have to check for null...I hate that
        if (names != null)
            foreach (string name in names)
                Console.WriteLine(name);

        //Yea!  I dont have to check for null anymore!
        foreach (string name in names.Enum())
            Console.WriteLine(name);
    }

    The extension method uses the "yield return" and "yield break" iterator syntax to let the foreach either spin over the IEnumerable if its not null, or if it is null, "yield break" returns false from the IEnumerable.MoveNext which tells the foreach that there are no more items in the list so it should break out of the loop.

    So, no more null checks!

    <Update>
    A reader commented that this could be optimized by using the static method Enumerable.Empty<T>.  This would save an object instance from being created by the yield return functionality.  The new and improved Extension Method is as follows:

    public static IEnumerable<T> Enum<T>(this IEnumerable<T> input)
    {
        return input ?? Enumerable.Empty<T>();
    }


  • NTEXT vs NVARCHAR(MAX) in SQL 2005

  • I recently profiled a sproc that makes heavy use of the TSQL SUBSTRING function (hundreds of thousands of times) to see how it performs on a SQL 2005 database compared to a SQL 2000 database.  Much to my surprise the SQL 2005 database performed worse...dramatically worse than SQL 2000.

    After much researching it turns out the problem is that the column the text was stored in was an NTEXT, but SQL 2005 has deprecated the NTEXT in favor of NVARCHAR(MAX).  Now, you'd think that string functions on NTEXT would have the same performance on 2005 as it did on 2000, but thats not the case. 

    Ok, so NTEXT is old badness, and NVARCHAR(MAX) is new goodness.  Then the next logical step would be to convert the column to be a NVARCHAR(MAX) data type, but here lies a little but very important gotcha.

    By default NTEXT stores the text value in the LOB structure and the table structure just holds a pointer to the location in the LOB where the text lives. 

    Conversely, the default setting for NVARCHAR(MAX) is to store its text value in the table structure, unless the text is over 8,000 bytes at which point it behaves like an NTEXT and stores the text value in the LOB , and stores a pointer to the text in the table.

    So, just to recap, the default settings for NTEXT and NVARCHAR(MAX) are completely opposite.

    Now, what do you think will happen when you execute an ALTER COLUMN on a NTEXT column that changes the data type to a NVARCHAR(MAX)?  Where do you think the data will be stored?  In the LOB structure or the table structure?

    Well, lets walk through an example.  First create a table with one NTEXT column:

    CREATE TABLE [dbo].[testTable](
        [testText] [ntext] NULL
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

    Next, put 20 rows in the table:

    INSERT INTO testTable SELECT 'hmmm...i wonder if this will work'

    Then run a select query with IO STATISTICS:

    SET STATISTICS IO ON
    SELECT * FROM testTable
    SET STATISTICS IO OFF

    Now, looking at the IO stats, we see there was only 1 logical read, but 60 LOB logical reads.  This is pretty much as expected as NTEXT stores its text value in the LOB not the table:

    Table 'testTable'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 60, lob physical reads 0, lob read-ahead reads 0.

    Now, lets alter the table to be an NVARCHAR(MAX):

    ALTER TABLE testTable ALTER COLUMN testText NVARCHAR(MAX) null

    Now when we run the select query again with UI STATISTICS we still get a lot of LOB reads (though less than we did with NTEXT).  So its obvious that when SQL Server did the alter table, it didn't use the default NVARCHAR(MAX) setting of text in row, but kept the text in the LOB and still uses pointers lookups to get the text out of the LOB.

    Table 'testTable'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 40, lob physical reads 0, lob read-ahead reads 0.

    This is not as expected and can be devastating for performance if you don't catch it, since NVARCHAR(MAX) with text not in row actually performs WORSE than NTEXT when doing SUBSTRING calls.

    So how do we fix this problem?  Its actually fairly easy.  After running your alter table, run an update statement setting the column value to itself, like so:

    UPDATE testTable SET testText = testText

    SQL server moves the text from the LOB structure to the table (if less than 8,000 bytes).  So when we run the select again with IO STATISTICS we get 0 LOB reads. 

    Table 'testTable'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    YEA!  This is what we want.

    Now, just for grins, what do you think happens if we change the NVARCHAR(MAX) back to NTEXT?  Well it turns out that SQL Server moves the text back to the LOB structure.  Completely backwards from what it did when converting NTEXT to NVARCHAR(MAX).




  • Easily write Reflection.Emit code

  • I was looking at Refletor addins the other day and ran across one that would be an amazing time saver. 

    Its an addin that generates the Reflection.Emit code!

    Anyone who has ever spent any time with the Reflection.Emit namespace should immediately realize how wonderful this tool has the potential to be (as long as the generated code is of good quality of course).

    Also, the way integrates with Reflector is pretty slick.  It adds a "Reflection.Emit" choice in the list of languages you want Reflection to display the code in.  Then, in the left pane,when you click a module, class, method, property, whatever, it displays the Reflection.Emit code in the right pane that you would have to write to generate the thing you clicked on.

    Simple...and amazing!

    I recently spent 6 days writing Reflection.Emit code to generate two fairly complex methods.  2 days each for writing the code, and 1 day each for debugging it and making it actually work.  I probably could have cut that down to 1 to 2 days using this tool.

    I haven't yet compared the addin's generated Reflection Emit code to the code i've written manually to validate its quality, but just playing around with it, the generated code looks pretty good.

    It can be found here:
    http://www.codeplex.com/reflectoraddins/Wiki/View.aspx?title=ReflectionEmitLanguage&referringTitle=Home


  • The Instrumentation Model

  • I've spent a lot of time lately thinking about instrumentation and how to integrate it into software projects.

     

    As a performance engineer I tend to think about instrumentation from the point of view of someone who wants to record the details of what a system is doing, and then dig through the data and use it to figure out what is wrong.

     

    But I’ve been talking to people the past few months about instrumentation, I’ve come to realize that instrumentation means different things to different people.  Some people think of instrumentation as a high level, light weight set of metrics that are easy to consume, understand, and extrapolate performance deltas; a management point of view.  Other people, like me, think of it as recording low level details of what’s going on in the call stacks and sql engine; a trouble shooter point of view. And then others think its somewhere in between; everyone else.

     

    Well, I think everyone is correct.  There are different levels of instrumentation that are useful at different points in validating system health.  There should be easy to consume and understand metrics to validate day to day health checks, there is medium level detail instrumentation that is used to figure out where a problem is, but takes a bit more effort to analyze.  And if that isn’t enough to find and fix the problem, there is the dump everything to file model that gives you all the data you need to understand what is going on in the system, but requires internal knowledge of the system and time to analyze the data.  Also, each level builds upon the other, so there is as little duplicated effort as possible.

     

    So I’ve tried to create an instrumentation model demonstrate these different levels, the answers each level tries to answer, and when you move onto the next level

     

    The first level will provide you with the most early bang for your buck, and it’s a easy way to tell if you have a problem, with as little dev effort as possible.  Then as you get the high level metrics in, you can start building in the mid level metrics, and so on.  The main thing is to not try and build the entire instrumentation framework up front before you put anything it.  Start putting high level metrics in early and use then in your automated testing


    Instrumentation Model Image



  • Code Analysis Tools

  • I've started working on a collection of code analysis tools, that are open source and available for anyone to use.  I've got a descriptive article located at CodeProject.com (http://www.codeproject.com/cs/algorithms/Not_Used_Analysis.asp), which includes the source code and binaries.

    The three main tools that I have so far are the following:

    • A “Not Used Finder”: searches through a list of assemblies and looks for any type, method or field that isnt ever used.  Points out code that you should be able to remove.
    • A visibility analysis: searches through a list of assemblies and looks at the visibility of methods and types and shows those that have a visibility higher than is required based on current usage.
    • A Duplicate / Near Duplicate code analysis


  • Introduction to Creating Dynamic Types with Reflection.Emit parts 1 and 2

  • I've recently finished part two of a series of articles on creating dynamic types with System.Reflection.Emit.  Dynamic types are types, or classes, manually generated and inserted into an AppDomain at runtime, from within the program.  The two articles are linked below.

    Part 1:
    http://www.codeproject.com/dotnet/Creating_Dynamic_Types.asp

    Part 2:
    http://www.codeproject.com/useritems/Creating_Dynamic_Types2.asp

    I'm curious though.  I've got ideas for two more articles, but dont know if people would be interested in them.  The third article in this series would cover how to create an Aspect Oriented Programming framework via Reflection.Emit.  And the fourth article would go over how to debug dynamic types.  Would anyone be interested in these topics?


  • Generic IComparer is a good thing. And null comparisons

  • There is something I've been wanting to post about for a while, but just didn’t have enough info to make it worth my, or anyone who reads this, while.

     

    We all know that .Net 2.0 came out with something wonderful called Generics.  I've been profiling a lot of my 1.1 code, comparing it to my 2.0 code that utilizes Generics, and the one change that has given me the greatest, most dramatic performance benefits is to switch from IComparer to IComparer. 

     

    I use IComparer a LOT.  But there is a problem with it is it's Compare() method.  The following is the pattern I use on all my compare functions:

     

    public int Compare(object x, object y)

    {

    if (x == null || y == null)

    throw new ApplicationException("Invalid NGram Compare");

     

    NGram n1 = x as NGram;

    NGram n2 = y as NGram;

     

    return n1.Score.CompareTo(n2.Score);

    }

     

    It takes an object for both its comparison operators.  Then I have to cast it to NGram before I can compare against my Score property.  This extra step may not be much, but if you are sorting an array that holds 1.5 million NGram objects, this adds up to a LOT of operations.

     

    Enter IComparer!  This is the Generics version of IComparer, and its compare function would look like this for IComparer. 

     

    public int Compare(NGram x, NGram y)

    {

    if (x == null || y == null)

    throw new ApplicationException("Invalid NGram Compare");

     

    return x.Score.CompareTo(y.Score);

    }

     

    Notice that two NGram instances are passed into the Compare function, instead of two objects.  This lets me save the casting operations.  YEAH!  It must be faster, right?

     

    So like any good performance guy (or gall for that matter) I revved up my favorite code profiler and ran two tests: one with the generic comparer and one with the object comparer, and sorted an array of 1.5 million NGram objects a few times.  And guess what I saw.

     

    The generics comparer was slower than the object comparer.  And not just a little bit slower either.  But 61% slower!!!  Oh my gosh…what the hell is going on here?  Generics have to be faster!  Its less code! 

     

    So, to get to the bottom of this mystery, I opened my assembly in Reflector and took a peek at the IL code for each function.  I know IL doesn’t lie to me.  What I saw was very interesting, and it had to do with checking an object to see if it is null. 

     

    This is a section of IL from the generics Compare function.

     

    L_0000: ldarg.1

    L_0001: ldnull

    L_0002: call bool NGram::op_Equality(NGram, NGram)

    L_0007: brtrue.s L_0012

    L_0009: ldarg.2

    L_000a: ldnull

    L_000b: call bool NGram::op_Equality(NGram, NGram)

    L_0010: brfalse.s L_001d

    L_0012: //throw exception stuff

     

    This is pretty much what I expected to see.  Argument 1 is loaded onto the stack, then a null is loaded onto the stack.  Then the NGram's operator equality function is called to see if the two are the same or not (is the ngram null).  If it is, then it branches down to the throw new exception code.  It then loads argument 2 onto the stack, and another null onto the stack.  And does another NGram operator equality check.  Like I said before, nothing really amazing here, and pretty much what I'd expect.  The IL has to call the operator Equality function just to be sure that I didn’t override it in my NGram class.

     

    Ok, now lets take a look at how nulls were checked I the object comparer's IL code to see what was so different that it would be 61% faster.  This is what I saw:

     

    L_000e: ldarg.1

    L_000f: brfalse.s L_0014

    L_0011: ldarg.2

    L_0012: brtrue.s L_001f

    L_0014: //throw exception stuff

    Damn…that’s a lot less code!  What's going on here?  Well, it looks like the IL compiler does something that the C# compiler wont let you get away with.  In C++ a NULL, a 0, and FALSE are all the same thing.  They are all 0.  But the C# compiler doesn’t allow you to make this leap.  Null, 0, and false are three distinctly different things.  So what the IL code is doing is loading the first NGram instance onto the stack, then just doing a false equality check.  Since null the same as false (in IL) this works just fine.

     

    The C# compiler knows, when comparing null an object that is casted all the way down to Object, it can just compare it to false and call it good.  So why didn’t the C# compiler do this for the NGram null comparison?  Because I could have overloaded the == operator in my NGram class, that’s why.  And if I did, then it would need to call it.  But doesn’t the C# compiler have enough info to check if I have overloaded it or not, and if not do a false comparison?  Yes it does, but it looks like that’s one optimization it doesn’t do, unfortuanttly

     

    So to test this theory out, I took the null check out of my IComparer.Compare function and re-ran my test code under the profiler, and this time the generic comparer without the null checks were 66% faster than the object comparer.  Ahhhh, satisfaction at last.

     

    This exercise reinforced something in my head.  Even if you KNOW that thing a is faster then thing b, always profile it just to be sure.  Yes, a generic typed comparer is much faster than a normal object comparer.  But if I had just done the code change and called it good, I would have actually slowed my app down.  Which is a bad thing.


  • Color Picker Visual Studio Macro

  • Every now and then I need to create a color object in my code, but don’t know exactly what color I want.  So I created this little macro to popup the ColorDialog, then insert a little line of code for the color you picked.

    Nothing magic or special here, just another useful macro.  The only problem with it is the color dialog comes up behind Visual Studio so you have to Alt-Tab to see it.  A bit annoying, I know.  If anyone figures that one out I'll post the fix.

    Public Sub ColorPicker()
    Dim colorDlg As New ColorDialog
    colorDlg.AllowFullOpen = True
    colorDlg.AnyColor = True
    colorDlg.FullOpen = True
    colorDlg.SolidColorOnly = False
    Dim ret As DialogResult = colorDlg.ShowDialog()
    If ret = DialogResult. Cancel Then
    Return
    End If
    Dim color As System.Drawing.Color = colorDlg.Color
    Dim code As String
    If color.IsNamedColor Then
    code = "Color color = Color." + color.Name + ";"
    Else
    code = "Color color = Color.FromArgb(" + color.ToArgb().ToString() + ");"
    End If
    Dim textSelection As TextSelection = DTE.ActiveDocument.Selection()
    Dim edit As EditPoint = textSelection.TopPoint.CreateEditPoint()
    edit.Insert(code)
    End Sub


  • Nice VS 2005 Snippet Collection

  • Ever since I discovered snippets in Visual Studio 2005, I've been using them like crazy. 

    Microsoft has a good list of pre-canned snippets (http://msdn.microsoft.com/vstudio/downloads/codesnippets/) in 13 different categories.

    But the best category of snippets I've found thoroughly covers NUnit code templates.  It's located here: http://www.codeproject.com/dotnet/UnitTestCodeSnips.asp.

    Combining these NUnit code snippets, with TestDriven.Net (http://www.testdriven.net/) and it really makes it easy to practice test driven development


  • Invaluable tool!

  • Yesterday I stumbled across a totally invaluable tool to help with unit testing your code in Visual Studio:  TestDriven.Net. (formerly known as NUnitAddIn)  It's a Visual Studio addin that allows you to run your NUnit, MbUnit, Team System, and soon Zanebug unit tests by just right clicking on the test method, class or namspace and clicking the “Run Test(s)” menu item.  You can run just one test, all the tests in a class or all the tests in the namespace.  This is cool and all, but the best part is that it will allow you to run the test under debug.  So you set your break point in the unit test, right click and pick “Test with...Debugger” and boom!, you've now got the process caught on your breakpoint.  No more attaching the debugger to a running NUnit process.  This is especially nice if you need native code support with the debugger, because when you detach the debugger from NUnit, NUnit would get closed.

    Now this isn't a total replacement for using NUnit when TDD'ing.  You still would want to run the entire suite of tests fairly often.  Its writing individual tests where this tool really shines.

    And did I mention the best part?  Its free!


  • NMock or NUnit.Mocks

  • Does anybody have any experience with either of these two types of mock objects?  Over the past few months I have gone back and taken a new look at Test Driven Development and am starting to switch the way I think about writing code. 

    In writing some of my unit tests I've had a need for a mock object framework.  In looking around, i've noticed that these two seem to be the two brightest stars in the .Net universe in respect to mock objects.

    Can anyone compare and contrast them? Give some insight into using them?


  • Top 3 hard rock / metal cover songs

  • So I've recently been looking for fun rock / metal covers of songs from the 60's / 70's / 80's. 

    My top 3 favorites are:

    1. Deadsy's cover of Rush's “Tom Sawyer“
    2. Korn's cover of Pink Floyd's “Another Brink in the Wall
    3. Metalica's cover of Bob Seger's “Turn the Page“

    What are your favorites?


  • Interfaces or abstract classes, or There is no silver bullet

  • A few weeks ago I attended an AOP workshop at Microsoft.  One of the AOP flavors that was presented requires you to implement an interface for every class that you want to apply aspects too.  I find this fairly annoying and constricting.  When I voiced my concern about having to create one interface for every class in my 600 class architecture, I was told by the majority of the people there, from both academia and the CLR team, that this is how you should design your framework anyway.  That interfaces allows for the greatest extensibility.

    Interestingly enough, I just started reading a book called Framework Design Guidelines, written by Krysztof Cwalina and Brad Abrams, both heavy hitters at Microsoft.  In chapter 4, Type Design Guidelines, they state that when designing a polymorphic hierarchy for reference types, in general, you should opt for using abstract classes vs interfaces.  The book states that when applying a “Is A“ relationship you should utilize an abstract class.  And if you are applying a “Can Do” relationship to a class, then you use an interface (IDisposable, IEnumerable, IComparable).  The main argument here is that interfaces should be immutable from version to version, but base classes can evolve with much greater ease.  The only major down side to using abstract classes vs interfaces is that .Net only allows 1 class inheritance, but you can interface inherit all day long. 

    My main programming mantra states that there is no silver bullet.  There is no “One” tool.  Each tool has a purpose, and should only be used for that purpose and that purpose alone.  When designing a system, look at your requirements and use the tools that fit the situation appropriately.  Don't try to force the use of a tool just because you used it before.  Try to understand what the tool's use is for.  But it seems people are looking for the Matrix version of a programming tool.  The One...

    When I here people saying “Every class should have an explicitly defined interface” it really makes me wonder where this comes from.  I have to think these guys are throwbacks from the COM days, who didn't really understand why COM did this.  They just understood that if it was a class, it had an interface, and they didn't need to know why. 

    Now, the really interesting thing is that both the authors for the Framework Design Guidelines book are program managers for the CLR team, yet there were people in their team spouting the interface mantra.


  • The Eight Fallacies of Distributed Computing

  • Someone on GeeksWithBlogs posted this link, but I think it was so important that it deserves a second showing.  Its the 8 fallacies of distributed computing.  As anyone who has looked through my blog knows, I'm not a big fan of the Web Service storm thats blowing through the programming world.  It just doesnt make sense to blindly make massive distributed architectures inside your own fire walls.  But companies are doing it.