Often times, applications are created with an emphasis being placed on shortness of development time; performance is not looked at until fairly late in the game, and at that point rectifying the situation can and often does prove costly in terms of resources.
Even more importantly, little thought is often given to the underlying data structures used by the application, which often have considerable performance ramifications down the road.
In later blog postings I’ll be exploring some of these items and issues in greater detail; however in this posting I am going to give some overall tips and insight into developing scalable high performance software.
Is it a Race Car or a Cadillac?
Most programmers want something that is as fast as a race car, but what they drive is more similar to a Cadillac. It’s designed to be easy to drive, and relatively convenient. It may even be cushy.
A Race Car in general though is anything but cushy; few have doors, many have no windshields, few have radio or ipod connections, and very few have a back seat. And a place to put the groceries? Fuggedabouddit!
But what it lacks in convenience, it more than makes up in speed.
Most programmers want to drive something that is like a Cadillac; whether it’s PHP, Ruby, Java, C#, or whatnot, what they are driving is primarliy built for convenience, it ain’t built for speed baby. (Well, C# is but even that isn’t as fast as C++ although it is faster than all of the other alternatives listed).
They want automatic memory management, they want fancy string handly, they want object oriented programming, and they want it to be fast.
And they can have it fast, as long as they don’t load up with too many Layers, Tiers, Frameworks, and Data Transformations in their code.
What do I mean by all this?
Let’s say you want to display a web page that contains every employees first and last name and their phone #. Sounds simple right?
Now in this case, the source data is coming from a database, and the output is an html stream.
It should therefore stand to reason, that the fewer steps taken in the process of reading the data from the database and writing it to the html stream the faster, and therefore more scalable it will be.
This means basically reading directly from the database driver or result set and then outputting directly to the HTML with as few transformations as possible (converting ints to strings, strings to ints, concatenating strings, etc)…
The more middleware and data transformations between the source and the output will hinder performance and thus scalability.
Big sappers in perfomance in this category are often things like Typed Data Sets, ORM layers, BLL layers, Entity Mapping, XML transformations, and of course, Web Services.
With the exception of Web Services, everything in the above paragraph is merely a notational convenience that makes the software easier to understand, albeit at the expense of performance.
Does this mean we have to stop using all those tools and throw them out with the back seat?
Not necessarily — but one needs to understand the performance implications of all the data transformations wrought by the middleware and be prepared for the combined performance hit of all the layers working together, which can be significant.
Many applications never will need to scale beyond a departmental level; but the ones that do need to pay careful attention to their internal datastructures lest they spend more time transforming the data than actually processing the data.
Additionally it is possible to operate in a mixed mode; using the BLL/ORM for some functions and bypassing them as necessary for higher performance functions.