Wednesday, December 18, 2013

I Can Write That Method with 1 Line of Code

I've worked with developers who prided themselves with terseness -- writing a function with the absolutely fewest lines of code possible.

Are fewer lines actually better?
I'll stick with my standard answer: "It depends." Unfortunately, many times, fewer lines of code comes at the cost of readability and maintainability -- and as you know, these are very big concerns of mine.

Let's take a look at some code to see if we can find a good balance. The code is taken from "IEnumerable, ISaveable, IDontGetIt: Interfaces in .NET". Now, I have a tendency to be a bit verbose when I'm starting an application. This helps with debugging, especially in the early stages.

Here's how the method stands in the downloaded code (from the RepositoryFactory.cs file). We'll refer to this as the 5-line version:


This method dynamically loads a type from an assembly based on configuration. In this case, it returns a concrete repository class that implements the IPersonRepository interface. Let's do a quick step through this method to see what each line does:
  1. We pull a value out of the configuration file. This value is the assembly-qualified name of the type we want to load.
  2. Based on the assembly-qualified name (from #1), we use the GetType method to generate a CLR Type object.
  3. With a Type object, we can use the Activator class to create an actual instance of that Type. The CreateInstance method returns an "object" that we stick into the "repoInstance" variable.
  4. Since we need something that implements the IPersonRepository interface, we cast our "object" to "IPersonRepository".
  5. Finally, we return the repository, which is an IPersonRepository.
That's a lot of steps, and it's okay if you don't understand exactly what's going on. What we want to focus on is the number of intermediate variables that we have in this method. We actually have 4 variables: repoTypeName, repoType, repoInstance, and repo.

Getting Rid of Intermediate Variables
We can eliminate some of these intermediate variables by inlining them -- basically, we just replace the variable usage with its assignment.

The easiest thing to do is to combine lines 3 and 4 to get rid of the intermediate "repoInstance" variable:


So, we immediately cast the return value from the CreateInstance method to an "IPersonRepository".

But then, do we really need the "repo" variable? All we do is return it in the next line. So, let's combine lines 3 and 4:


This gives us a fairly compact method.

But Why Stop There?
Let's keep going. We can get rid of the "repoTypeName" variable by combining lines 1 and 2:


We just take the "AppSettings" statement and use it directly as a parameter for the "GetType" method. Now we're down to 2 lines of code, and we only have 1 intermediate variable.

Can we get this down to just 1 line?


Of course we can. But it's a little difficult to read since it's stretched out. Let's add a few line breaks:


This is still a single line of code (at least in the source -- what it gets compiled to is a different issue). Now, it's easier to see everything. And the terseness-obsessed developer would be proud.

Is This a Good Idea?
Now that we've whittled things down to 1 line of code, we need to stop and ask ourselves if this is a good idea. Let's look at this code from a couple different perspectives.

Readability
The number one problem I have with the terse code is readability. The degree of readability will depend on the experience of our developers. If our developers have not worked much with reflection, then this code is nearly indecipherable. The advantage with the intermediate variables is that we get some clues as to what is going on in each step based on the variable names (even if we disregard the variable types themselves).

"repoTypeName" lets us know that the value coming out of configuration is the name of a Type. "repoType" lets us know that this is a CLR Type object. "repoInstance" lets us know that we now have an instance of a particular type. And so on...

Maintainability
What if something goes wrong with this code? Let's set a breakpoint:


Uh oh. It doesn't look like this breakpoint is going to do us much good. At least with the intermediate variables, we'll be able to set good breakpoints, and we can see the values that are produced during each step.

Performance
The 1 line version has a small performance benefit. We can see this by looking at the IL that is generated.

Here's the IL for the 1-line version:


And here's the IL for the 5-line version:


If we look at the "meat", we see that the same instructions are run in each version:
  1. call to "get_AppSettings". This gets the AppSettings property from the ConfigurationManager.
  2. ldstr for "RepositoryType". This is the string literal for the setting we want to load.
  3. callvirt to "get_Item(string)". This gets the value that we're looking for from configuration.
  4. call to "GetType". This is the Type.GetType call from our code.
  5. call to "CreateInstance". This is the Activator.CreateInstance call from our code.
  6. isinst for "IPersonRepository". This is the cast to the IPersonRepository interface.
  7. ret. This returns the final value.
The difference between the IL output is in that the 5-line version has the intermediate variables. These are the "stloc" and "ldloc" calls that are interspersed in the code.

So, technically, the 1-line version will be a little faster because it does not deal with these intermediate variables.

But (and this is a BIG BUT), the slowest parts of this method are the reflection calls ("GetType" and "CreateInstance"). These are orders of magnitude slower that the variable code. So, in this case, we really should not worry about the differences in performance. Any slight gain we might get will be overshadowed by the slowness of the reflection code.

Finding the Balance
So, how do we find the balance?

Here's my approach: I start out with the verbose (5-line) method. During the development and early testing process, I want to make sure that the intermediate values are what I expect. So, I like having the extra variables (to put in the Watch window) and extra lines (to add breakpoints).

But once I'm past this, I'd like to refactor things down a bit. My balance point for this method is the 3-line version:


I like this for a couple of reasons. First, I can easily verify (with a breakpoint or watch) that the value I pull out of configuration is what I expect it to be. This is the brittlest part of this method -- configuration is just text in an XML file, and this is very easy to typo or just enter wrong values.

Next, I can verify that the dynamically-loaded assembly is actually available. The "GetType" method uses reflection to load up the specified assembly and get the type information out. If the assembly is not available (or the assembly is available but does not contain the expected type), this step will fail. Again, this is easy to breakpoint, and if there is an exception, it will point right to this line of code.

I'm good with combining the rest of the method into a single line. In my opinion, this is still readable. And we can easily break this into separate lines if we do happen to have problems in this section.

Know Your Team
A big part of this is to understand the skill levels of the people you work with (or the people who will be working with the code). Whenever I'm given a choice between dumbing-down code and making developers better, I will always choose to make the developers better. But sometimes we do not have control over that.

We need to code at a level that the developers understand. So, if I am in an environment where reflection is an unfamiliar topic, then I might just stick with the original 5-line version.

So, know yourself, know your team, and strike the balance that's appropriate for your environment.

Happy Coding!

2 comments:

  1. Very good recommendations. The debug breakpoint issue should always be somewhere on the back of your mind. For that reason, sometimes I might choose to still put the return value in its own variable. Another recommendation (though not what this post is about) would be to cache the result in a static variable to avoid taking the reflection hit more than once. But great article as always. Thanks!

    ReplyDelete
    Replies
    1. I often have return variables as well; it does help with debugging.

      Using a static variable is a good way to minimize the reflection hit. I use this same code in my Pluralisight course (shameless plug) "Practical Reflection in C#" (http://pluralsight.com/training/Courses/TableOfContents/practical-reflection-dotnet). In order to minimize the reflection hit, we make sure that we only call this particular method one time in our code.

      Delete