I hope this article will provide some general guidelines for programmers who are faced with a program that is 'too slow'. I'll look at trying to decide how much optimising is needed, some of the alternative techniques available, measuring the improvement and briefly talk about a common measure of algorithm speed.
In any programming task there are a large number of possible goals; common ones, other than performance, include:
But programmers the world over seem convinced, often against the evidence, that they can and should spend time optimising programs without justifying this decision.
Sometimes though you don't need to make the program faster - it is simply a matter of user perception! People complain, reasonably enough, that programs are slow when they have to wait for them. There are at least three things you may be able to do to improve this. Firstly you may be able to make the slow part happen asynchronously, perhaps by executing it as a background process, which allows the user to carry on working while the activity completes. Secondly doing the tasks in a different order may help. For example, moving the slowest part to the end of the chain may mean the user can leave the program to complete while they get on with another activity. Thirdly giving the user an indication of progress. As anyone who has waited for a train late at night knows, doing nothing makes time drag. If the user gets an indication of progress (percentage complete and/or time to finish) the delay is more manageable.
When you do need to optimise a program you need to decide on the priority of performance compared with other goals for the program. Setting this priority can be hard. Sometimes targets for performance are provided up front when the initial specification is written, but I find that usually performance is not specified at all - it is just assumed that the program will be 'fast enough'. (Even when specified the actual phrasing is sometimes subject to interpretation. As an example, suppose a specification states "the communications module must be able to process an average of 35 incoming requests per second". In a system where message lengths could range from 32 to 8192 bytes there could be a lot of arguing over whether or not the specification had been met!)
If you are trying to set the priority you must use your judgement. In some cases performance is vital - if you're too slow you fail. This is typically true for many parts of 'real time' programs where failing to keep up with incoming events can produce catastrophic failure - or if an interactive game is perceived as 'too slow' the users may simply reject it.
When the situation is less clear cut a common approach is to let the users set the priority - you might say 'I have one week of development time, what would you like me to concentrate on?' Their top requirement might be fixing a bug, adding a feature, etc. rather than making it faster.
An approximate measurement that can help is the combined performance of both development and deployment. What do I mean by this?
Optimising code costs time. How much time will be saved by the speed up? A rough calculation of this figure can help quantify how much time it is worth spending making the program faster.
Based on the expected use of the program I can estimate how much time would be saved (and for whom) if I optimised it. Then I can balance this time saving against the time lost by optimising - include both the development time and any effect on the completion date.
Spending half a day optimising an end of month report, run by one person, to take four minutes rather than five is likely to be a net waste of time!
Firstly, perhaps just improving the platform! Check the machine has enough memory. I once had a program which ran several hundred times slower when moved from a 50MHz 486 machine to a 90MHz Pentium one; simply because the Pentium machine had less memory and spent most of its time swapping data between main memory and the hard disk. "Moore's Law" states that chip density, and hence performance, doubles every 18 months. It is really an observation rather than a law, but it has been true for a while and looks likely to remain so for some years to come. So upgrading your hardware (or adding more memory) might be a good start. There's even an equivalent software 'law' - "Proebsting's Law" [1] - which states that compiler optimising technology doubles every 18 years (!) So getting a newer compiler, and checking optimisation is enabled, may help too.
Then there could be lots of different ways to change the way the program runs to reduce the impact of it being too slow. For example:
- Share the output of one report among a number of users
- Look at the task which is causing the complaints - perhaps you can restructure the business process to remove or simplify this task, or to batch up several tasks into one.
- Sometimes many users are doing very similar analysis of the same data in the same way. Perhaps some analysis, or even partial reports, could be generated and saved beforehand.
- Specialised hardware could be used if you're doing a lot of intensive numeric computations.
- Depending on the languages you are using in the project perhaps the slow part of the program could be re-written in a faster language.
- Split the task between several machines. You might be able to make use of another machine to run a slow running report offline, or overnight.
- Make the program multi-threaded. Be careful though - making a program multi-threaded late in the development cycle can cause more problems than it solves.
- Enable unattended operation, perhaps by allowing data to be provided in a file as well as interactively, so the user can start something running and then go off and do other tasks.
Be creative! Sometimes you can find a way to drastically improve the effective performance of the overall process without making big changes to the internals of the program.
Before you start you need to decide two things.
Firstly define the actual task you are trying to optimise - this includes the sizes of data sets, the target hardware, etc.
It may be worth creating a test environment, possibly including additional programs that can drive the one under test. For example, when trying to measure the performance of a Web server it is common to use a load generator to simulate a lot of users accessing your pages. The hard part of any test environment is trying to ensure that the test data and load is as similar as possible to the real one. In some cases this can be achieved by capturing data from the live system and replaying it for testing purposes. It is a good idea, where you can, to have several different data sets or you may end up accidentally optimising your code to match specific features of your test environment not often occurring 'in the wild'.
Remember one golden rule, to optimise the actual code you are going to deploy. Many environments follow the debug/release build model where most of the development cycle is performed on a 'debug' build (with debugging symbols, extra trace statements and assert statements) but the actual released code is built differently (optimised code, trace statements optimised away, etc). This is not the article to talk about the pros and cons of this common practice, just be aware that you will probably waste some (or even all) your time if you optimise the debug build and ship the release build!
Secondly you must decide how big an improvement you need. This measurement provides you with a target so you know when your optimisation task has finished. The so called 'Speed-up' theorem from the theory of computability states, among other things, that for nearly all functions there is no 'best' program - you can always find another algorithm which is faster. So without a target to aim for you may find you never stop optimising!
Now you are ready to identify which parts of the task actually take the time.
There are a number of techniques to measure this. The most comprehensive is to use a tool designed to provide performance measures, such as Rational's Quantify or Compuware's TrueTime. These tools are powerful and when all works well they provide you with various ways to view the collected data.
The most useful report for optimisation purposes is probably the 'top 10' of functions by execution time. This gives you a hit list to start with. Keep the data from this initial analysis for comparison purposes against the optimisations you make.
The big advantage of the tools is that they are written by experienced people who avoid the large number of traps for the unwary. They are designed to be easy to use but also contain powerful features which can help in particular cases. They are often expensive - but don't forget so is your time!
However sometimes you can't use tools like these. They may not be available for your target platform, or cost too much, or their impact on the program being measured may be excessive. I would still strongly encourage you to try and measure before you optimise. Depending on the size of the job this could include:
Perhaps you can reduce the number of times 'high scoring' functions are called, by caching common values. You might find the disk is a bottleneck and look at changing the data format, or the hardware. In a distributed world the network is often a bottleneck and you may be able to 'batch up' packets to produce a smaller number of bigger messages.
One common result is the whole program is limited solely by CPU speed, so I'll now focus on processor bound functions.
You may have come across this notation already - for example the C++ standard defines the complexity of operations for vectors, maps, etc. There is a more formal notation, called the 'Big-Oh order notation', for measuring complexity but this is outside the scope this article.
The complexity notation usually given is the 'worst case' figure. For example sorting is typically much faster when the input is already partly sorted data but the complexity measure is for randomly ordered data.
'Constant time' means the function is approximately as fast for a small or large number of data points. An example of a constant time operation is accessing an element of an array (in most programming languages) - it doesn't really matter how big the array is since it is simply a matter of calculating an offset.
'Linear time' means that the time for the function is roughly proportional to the number of points. An example of a linear time algorithm is finding a random element in a linked list since on average half the points must be tested.
There are other complexity values but both of these scale fairly intuitively - people expect programs to run more slowly if they're working on more pieces of data. If you double the number of items the program can run anything up to twice as slowly without serious complaints.
The real problem is functions that increase in time more than linearly. One common example is 'quadratic time' for functions where the time increases in proportion to the square of the number of items. As the size of the data increases eventually these functions 'take over' the performance of the overall program.
Suppose you have used several linear time functions and one quadratic time function in your program. Perhaps the quadratic function is only 1 millisecond out of an overall time of 100 milliseconds when you have only 10 data points. Consider what happens when you have 10 times as many data points. The quadratic function increases to about 100 milliseconds and the rest of the program increases to about 1 second. So the single quadratic function now takes 10% of the overall program time. What if you have 10,000 data points? This one function will dwarf the contribution of the rest of the program.
No matter how fast the code is, with enough data points the higher complexity functions will eventually 'take over' the performance of your whole program.
This is why it is so important, as I said earlier, to measure performance of the program as close as possible to the way it is used in practice - including the number of data points - because otherwise you might fail to detect such functions.
What can you do when you find a function that is of too high an order? Try to search for another algorithm that is of lower order - such as linear time. This is likely to have much more impact on the performance of your program than if you leave the algorithm unchanged and simply try to optimise the individual lines of code.
The order of many standard algorithms, such as sorting, and operations on common data structures - such as arrays, maps and lists - is well documented and it should be relatively easy to see whether a better algorithm exists.
Armed with this information you can often transform higher order functions into something better.
As a simple example, take this pseudo-code:
int N = collection.size();
for ( int index = 0; index < N; index++ )
{
element e = collection.elementAt( index );
...
}
How is 'elementAt' implemented? If the collection is implemented as a linked list then typically elementAt will be linear time and so the whole code fragment is quadratic.
What can you do? Many things - but complexity notation can help you analyse them quickly.
As an example you might change the collection from a list into an array. Then the whole iteration process becomes linear time - but you need to investigate the effect this might have on inserting items into the collection in the first place.
Modify, or rewrite, the slow functions. Test the changed functions and then measure the speed of the new code to see whether you have made the improvement you hope for. If the change has not improved the speed simply back it out (you are using source code control software aren't you?) and re-examine both the performance data and your analysis of it.
This is a particular example of refactoring, where the function is being changed solely for performance. A couple of the general principles of refactoring are particularly relevant here.[3]
Firstly, it is almost always a good idea to ensure you keep this refactoring separate from any other changes you may wish to make to the code; this will help to isolate the changes solely made for performance and ensure you are comparing like with like when measuring the effect on program speed.
Secondly, ensure you have a test harness or test code to verify that the new function has the same results as the old one. Rewriting a working, but slow, function into a blindingly fast one that gives the wrong answer is not usually useful!
One way to achieve the testing is to initially leave in the old function and execute both it and the new function. Then add a verification check to compare the outputs and ensure that your optimised function does in fact produce the same answer as the original code. Then switch off the old code and the verification check just before measuring the performance improvement. I can't recall how many bugs this simple technique has saved me from over the years!
Finally the apocryphal 'Jackson's rules of program optimisation' state:
(Many thanks to Mark Green and Richard Bridgman who reviewed drafts of this article!)
[2] Fred Fish's debug library - http://sourceforge.net/projects/dbug/
[3] "Refactoring: Improving the Design of Existing Code" Martin Fowler, Addison Wesley.