Tips for writing quality software

previous : Learn to Program	up: Learn to Program	next : How to Write Quality Documentation

Tips for Writing Quality Software summarizes principles and techniques an experienced programmer comes to appreciate.

Description

The four criteria by which the quality of a program is judged are correctness, performance, maintainability, and usability. The extent to which these points can even be judged depends primarily on the readability of the code, where "readability" is mostly about how clear the interactions betweeen the moving parts are. The tips below describe techniques that contribute to the development of one or more of these qualities.

Structure the data, create an interface to the structure, and only manipulate the data through that interface: This tip is at the top because it is the most important point. This is the way to maintain sanity as the footprint of the program grows. Larger structures can be composed from smaller structures, and the interfaces of those larger structures can be implemented in terms of the interfaces to their component structures. For programs written using object-oriented systems, use composition instead of inheritance. For more details on this point, see DKF's comments further down.

Design each program in the anticipation that it will become a component of a larger system whose characteristics, operation, and lifetime are unknown: How can the program's functionality be exposed so as to be maximally convenient and save for the client? Can the client effect proper cleanup? Is the interface flexible enough to facilitate creative use cases and the internals robust enough to withstand such use?

Encapsulate discrete bits of functionality: Each block of code is a component in the larger system. Short blocks of code are easier to think about during development and easier to audit for errors later. Each component has its own interface, which reduces the number of interactions between moving parts and thereby reduces the "attack surface" of unwarranted assumptions. To keep a block of code short is articulate its essential functionality and move non-essential pieces out. Separating different concerns into their own components is one of the fundamental activities a computer programmer engages in. Concern about that cost of function/command calls is often a harbinger of premature optimisation. When assemblers came around, machine coders objected to functions on the grounds that function calls ate into performance. At the time, the argument that computer time was more expensive than programmer time carried more weight than it does now.

Rely on Interfaces, not knowledge of the implementation: This is as true for the internals of a project as it is for operating with other systems or libraries. Knowledge of the implementation can be used in ways that are tricky to spot. For example, a bit of code may not test for a certain condition because it's known that the condition doesn't arise given the implementation, but then when the implementation changes, making it possible for the condition to arise, even careful review of the code might not turn up the logical chain which lead to the earlier decision not to test for that condition. It's a good habit to review code immediately after writing it, determine what assumptions were made based on knowledge of the implementation, and add comments about those assumptions. (See also: Law of Demeter.)

Organize the functionality and code into layers, and forbid lower layers from depending on higher layers: Just as it should be possible to use a component in different contexts, a layer is a set of components that serves as a platform for another layer. If a lower layer depends on a higher layer, than there is something wrong with the design, and it will eventually make the program overly complex and unmaintainable.

Use descriptive names for variables and procedures: Try to describe what you're storing rather than how it interacts with something. Be consistent in the names you use. If you want nTimes to mean "number of times" don't later use numTimes or nt. Being lazy just wastes your time. Also keep in mind that if you use variable names that are too short and use long comments to describe them you are wasting your time. This can be taken too far and then the variable names clutter the pattern of the program. A healthy balance between names like j and jumpPointStoresAPointerThatDoesX is needed.

Use each variable to mean exactly one thing: It can be tempting to "hijack" a variable to make decisions based on what its value indirectly implies given the current structure of the code. The problem here is that the structure of code changes as development continues, causing the implications to change. If a variable is used early in a block of code to mean one thing, and several lines later hijacked for its implications in order to make some other decision, the code in between the two sites may eventually change, changing the implied meaning and introducing a bug.

Consider the permutations: In a block of code, each variable represents some characteristic of the thing being modeled. When you introduce a new variable, scan the existing variables and ask how the characteristics they represent might affect each other. Pay particular attention to any variables that are being modified in loops, watching for edge-cases where assumptions don't hold.

Keep a consistent pattern of object/memory management: Decide ahead of time if the parent of a procedure should manage a certain object, or if the child or called procedure should, and try to be consistent about how things should be managed.

Use comments that describe how and why you are doing something: At the moment a line of code is written, the author is holding in their mind a set of principles, insights, and assumptions about the program. Well-placed comments bring the reader up to speed on what those principles, insights, and assumptions were. You may have been the author, and if you read the code months later, might find that comments help you as much as anyone else to remember those things and understand the code. "Why does this code work?", it's probably very apt.

Use comments when the pattern of what happens next may not be expected

Look for patterns in your code, and minimize repeats by using createng a procedure or an interp alias%alias (see 1): functions and macros in C.

Don't make assumptions that a command or function won't return an error.: Check the results and catch if absolutely needed.

Have a plan for each file before you begin coding.: It's all too easy to fall into the trap of staying up late coding massive amounts of code. Often if you really think about the problem you can reduce the amount of code substantially. By having a plan you also can work out potential problems before you invest time. This is what separates a software-engineer from a programmer. PYK 2018-01-07: So how does one work this plan? Start with a problem. Break it down into a list of problems, each one being not being further reducible. Once that is it's time to choose one and start coding.

Use the interactive tclsh/wish shell if you aren't sure about how something will work: Don't assume.

Use a consistent pattern of capitalization or _ or - for classes of keywords: If you decide that you want all classes to be like BoxClass don't later change to ball_class type naming. You will only confuse yourself and make it more difficult for your mind to parse your own code later on. Obviously in the world of packages this may not be possible, but strive to keep your own code consistent.

Foresee and avert future bugs: As code continues to be developed, it changes shape. Some patterns are more robust in the face of constant change, and some more brittle. Sometimes it's worth writing slightly more code at the outset to reduce the possibility that a future change will inadvertently introduce a bug. For example, if something then {return this} else {return that}. Although else could be removed, and return that could be the subsequent command that is only reached if something isn't true, doing that might increase the chances of those two lines of code being inadvertently separated in the future. As another example, if a variable is re-used for multiple purposes within a script, then future restructuring might inadvertently lead to a variable being in the wrong state at some point.

Don't rely on the interpreter/compiler to find bugs for you.: If you find yourself fixing bugs that the interpreter/compiler tells you about too often then you probably haven't planned it out well. The same applies to excessive use of a debugger. (See 8)

When coding in C be careful with == (equality testing): Consider if (var = NULL) -- usually the intended usage was if (var == NULL). The compiler accepts var = NULL, and you may wonder what is causing var to become NULL later on during runtime. An easy solution is to use if (NULL == var). This doesn't have the same problem, because the compiler will report it as an error if you try to assign a variable to NULL/0 as in if (NULL = var). Tcl/Tk could learn from this.

Design your code to be tested: Use a low degree of coupling between procedures and the rest of the software so that procedures can be tested in isolation. If using object-oriented extensions, try to follow the Law of Demeter.

Implement unit tests for your code: In addition to confirming that a program operates as expected, the act of writing tests is indispensible to articulating the design and behaviour of the program in the first place. It isn't uncommon to write the first test before writing a single line of the program. The program and the test suite should grow up together. Expect to spend as much time writing tests as writing the program. If at the end of any particular day the program code base is larger than the test suite code base, it might be time to write more tests. As the program grows more complex, it's quite comforting to run a well-developed test suite after making some changes and find that all tests pass. Once this has become a habit, when you don't have a good test suite you'll feel like you do in one of those dreams where you went to school in your underwear. Check out Tcl's own test suite for inspiration. tcltest is the de-facto standard tool for the job.

Be Positive: Code is easier to follow when conditions are expressed in positive terms. Of course this is only a guideline. There are always places where testing for the negation of something is the best approach, but minimization of negative conditions is one of the marks of higher-quality code.

Review code for readability: If you can easily read and understand the code you wrote a day, a week, a month after writing it, you may have some quality code on your hands.

Balance activities: In a well-managed project one might spend about 25% of one's time on developing tests, 25% on refactoring, 25% on fixing reported issues, and 25% on new features. Monitoring these ratios and taking steps to conform to them is a relatively easy way for an individual or team to improve the quality of the system. These ratios are also useful for estimating the cost of a project: Let a "dev unit" be the time needed to implement a system meeting the specified requirements. Add three more dev units to account for testing, refactoring, and fixing issues. Then, to account for unexpected difficulties add an additional four dev units (one for each activity category). Next, add four dev units (one for each activity category) to account for running changes to the requirements. Assuming reasonable competency and productivity, one has arrived at a what might conceivably be a semi-realistic estimate of the cost to release each first version of the system: 16 dev units. If fewer than four dev units were spent to specify the requirements, the project is underspecified. If a GUI is required, estimate it as a separate project requiring and additional 16 dev units. Start work on the GUI only after the first version of the system is released. To increase the chances of success, keep each project as small as possible, and break out functionality into separate projects whenever it's feasible. Notice that no time is allocated for documentation. Ain't nobody got time for that.

Please append your own tips.

DKF - Here's a few that are related to the ones above.

Your ultimate goal is to write clear and correct code. Remember: if your code is clear, it is easier to make it correct.
It is better to have a function do a conceptually consistent action than it is to keep the function short.
It is better to keep the number of places that know about a data-structure's implementation layout very small. This is another variation on the Law of Demeter but it really bears repeating. Note that macros and inline functions do not really conceal the knowledge of a data-structure; they just hide it from the programmer and not from the code itself.
If you're allocating structures, initialise all the fields at the same time. Better yet, write a function to do allocation and basic initialisation (either to valid null values or to obvious marker/guard values) and use that function everywhere else.
Do not use the result of assignment (assuming you're using a language that defines it.) Sure it's defined, but it leads to really murky code. Do the assignment on a separate line (yes, you can afford it.)
KPV A generalization of the above tip is to avoid horizontalizing code. I admit I'm often guilty of this in the quest for compactness (see tip #1).
wdb Avoid making an object more powerful than necessary. If you have written a method, and later you see that this method is not really used, then don't hesitate to delete it. An object shouldn't be almighty. Less is more.

Duoas: I think that the names we give things in our code often expresses how well we actually understand that code. A few additional thoughts:

It is worth every programmer's time to read and appreciate <Ottinger's Rules for Variable and Class Naming> . It is an excellent, straight-forward, peer-reviewed (that is, by our peers --i.e. programmers) treatise. Some of the following reiterates stuff in the paper (but is not an excuse not to read it ;-)

Really Good Naming Conventions:

Use nouns (and noun phrases) for variables. A variable is thing to be acted upon; it exists and nothing else.
Use predicates (verbs and verb phrases) for procedures. A procedure does something. It acts upon something else.
PascalCase/camelCase/CamelCaps/etc. Use them. (Tcl core conventions encourage both forms, where UpperCaps is used for function and types names and lowerCaps is used for variables.)
Underscore_delimited_names. Also good. (Personally, I prefer them.) They make for natural reading, while the old space and visibility concerns are a thing of the past. Mixture with CamelCase is also fine, so long as it is consistent.
Use names that you can actually speak in natural language. Words from your local language are OK, but for code with international scope just stick to English.
Short or one-letter names iff (1) they have universally recognized meaning, and (2) are local to small, bounded contexts (such as a function or loop fitting 10-15 lines or less). For example, x and y are universally understood to refer to some horizontal and vertical tabulation, typically in graphics, as row and col are oft used for textual tabulations. Likewise, src and dest are instantly understood to be 'source' and 'destination'. The simple n or i are fine for one- or two-line loops where the actual index count is inconsequential or immaterial to the algorithm. An s is fine for a string in the middle of some transformation. Functional programmers will use x and xs for temporary list constructs; they are instantly recognized as car/element and cdr/(remaining) elements. But, though an fft is a fast-fourier-transform, a better name would be FastFourierTransform.

Horribly Wrong Naming Conventions (or 'Corollary')

sTuDlyCapS and other l33t forms are for weenie wannabes; not professionals. Likewise SUIT_friendlyFORMs are to be despised.
Hungarian notation [L1 ] is evil. Unfortunately it has seduced countless masses for many years, meaning it is often inescapable when handling existing code. Only use it if you are required to do so. (Yes, this one is still a hot religious issue for many. I found an interesting paper advocating Hungarian notation [L2 ]. Personally, I don't believe a Microsoft employee is really an impartial commentator, and I think it succinctly demonstrates some of the warped thinking that Hungarian notation inculcates... but that's all I will say about that here.)
Positional notation, PRIME-MODIFIER-CLASS notation, and other silly 'one size fits all' schemes are likewise dead-weight traps that focus more attention on form and deciphering than meaning.
There is no longer any excuse for short, vowel-less, scrunched-up variable names (like ctmrnm and lclwtt --can you guess what they mean? [They do mean something]). Tcl can handle names of any (non-negative) size, and ANSI-C can handle 31 characters at minimum. That is plenty of space for legible names. If you find that your names are exceeding 30 or so characters, then you are killing your keyboard. Here is an example of just how much space that is:

        this_name_has_exactly_30_chars

Scrunchedupnames are nearly as unreadable. English script delineates on lexemes, and people's brains are trained to read it that way.
Even highly-specialized, domain-specific applications should avoid abbreviations and symbols that require considerable domain knowledge. Often, there are more than one symbol to express the same thing in mathematics, physics, and other sciences. Don't take shortcuts. Just because you are an expert in the domain doesn't mean that anyone else will recognize the notation.

Well, that's enough from me. :-P

NEM See also the Tcl Style Guide and the Tcl/Tk Engineering Manual (TIP 247).

I'd add that while the above advice is mostly good and useful, real quality software comes from a firm grasp of the fundamentals of computer science and software engineering, rather than just coding style guidelines. Learn about the underlying theories (mathematics, logic, set theory, number theory, data structures, algorithm analysis, etc, etc) and how to apply them. Learn as many programming languages as possible (including Logic Programming, Functional Programming, OOP, etc). Learn about concurrency, networking, databases, and so on. Then worry about whether your variable names are stylishly capitalized.

RS 2013-11-07: My rules for well-readable code include to use habitual names instead of being very creative. For instance, my stand-alone scripts always have the pattern (borrowed from C):

proc main argv {
    ...
}
...
main $argv

Or, the frequent operation of reading lines from a file:

set f [open $filename]
while {[gets $f line] >= 0} {
    ...
}
close $f

LES 2013-11-07: Although I really appreciate the content of this page, I don't see much in the way of "tips for writing quality software" in it. It has, instead, plenty of tips for writing quality CODE and strictly from the point of view of maintenance, which is too narrow to say the least or a whole different topic to say the most. You can always write top-notch maintainable code and end up with a crappy application, so unrelated the two things are.

DKF 2016-07-09: The key principles for writing quality software are to design for clear purpose, with low coupling and lots of testing. There are many sorts of testing. There are many notions of clear purpose. There are many levels at which it is often a good idea to reduce coupling. They are all relevant.

Oh, and approach everything with good taste. Any specific technique can be overdone.

Knowledge about Implementation

In Tcl 8.6.8 the % operator returned incorrect results. In the Tcl chatroom, 2018-02-14, dgp offered this analysis:

    03:51 <@ijchain> <dgp> .
    03:55 <@ijchain> <dgp> For anyone around and curious, the TIP 484 issue that led to the bad modulo operator is this...
    03:56 <@ijchain> <dgp> Since the last really big reform in Tcl 8.5, integer values are stored in the smallest representation that can hold them.
    03:58 <@ijchain> <dgp> In particular, every "0" value was stored as the "int" Tcl_ObjType...
    03:58 <@ijchain> <dgp> (which of course stores in a struct field of type "long" because we can never make anything simple or clear)
    03:59 <@ijchain> <dgp> In turn, if you pass any "0" value to TclGetNumberFromObj() it would be reported back as a TCL_NUMBER_LONG
    04:01 <@ijchain> <dgp> so when we were coding up all the calculation implementations in tclExecute.c, if there was any special case handling required for the value "0", you would find it in the stanzas of code that handle the cases when the argument(s) are in the TCL_NUMBER_LONG category.
    04:01 <@ijchain> <dgp> TIP 484 more or less made the "int" Tcl_ObjType disappear, and made TclGetNumberFromObj() never return TCL_NUMBER_LONG
    04:02 <@ijchain> <dgp> The value "0" now is in the category TCL_NUMBER_WIDE
    04:02 <@ijchain> <dgp> As promised by TIP 484 this means lots of code could be shortened or eliminated, which is good.
    04:02 -!- karll [~karl@c-73-166-194-93.hsd1.tx.comcast.net] has quit [Quit: karll]
    04:03 <@ijchain> <dgp> But at the same time you have to be sure any special case stuff gets moved to the new place it will matter, and that's where we have at least this one item falling through the cracks.
    04:03 <@ijchain> <dgp> When you have the case of $wide % $big, the algorithm gets all the answers right except when $wide == 0.
    04:04 <@ijchain> <dgp> And that never mattered until now because you knew a $wide would never store a 0
    04:04 <@ijchain> <dgp> You always get caught by the unstated assumptions because why state what you know will "always" be true? :D
    04:05 <@ijchain> <dgp> so patching up INST_MOD to correct the issue isn't hard.
    04:05 <@ijchain> <dgp> What worries me is the similar errors we just haven't found yet.
    04:05 <@ijchain> <dgp> fin

See commit 85e6dc0a02 for the fix.