Thursday, February 23, 2012

Coding Standards

Why have code conventions?

Code conventions are important to programmers for a number of reasons:
  • 80% of the lifetime cost of a piece of software goes to maintenance.
  • Hardly any software is maintained for its whole life by the original author.
  • Code conventions improve the readability of the software, allowing engineers to understand new code more quickly and thoroughly.
It doesn't matter what your guidelines are, so long as everyone understands and sticks to them. These PHP coding guidelines are based largely on Sun's Code Conventions for the Java Programming Language. Deviations from the Sun code conventions are largely a result of the interaction of PHP with other web server applications, primarily databases.

Enforcing the code conventions

If you are the lead developer on a project, you will encounter programmers who do not like your guidelines and refuse to abide by them. Coping with this is one of the small obstacles that leading a development inevitably entails. You have two options. You might decide that the value of the errant programmer's contributions outweighs the inconsistency introduced by their refusal to abide by your coding conventions. However, you are not doing yourself or your team any favors by permitting one programmer to be "above the rules". This will only lead to disruptions in the team down the road. The best way to address this issue is two-fold. First, try to achieve consensus among the team members. Not everyone will agree on everything, but it's better to minimize disagreements when possible. Second, once a code convention has been established, do not accept code into the project unless it adheres to those guidelines. This will cause some initial friction, but the long term cost of inconsistent coding conventions is much greater than the small amount of effort it will new team members to adapt to those conventions. At the end of the day, professional programmers know that they need to be adaptable and to abide by the rules of the current project.

Indentation (tabs vs. spaces)

  • Do not use spaces to indent: use tabs.
  • Indent as much as needed, but no more. There are no arbitrary rules as to the maximum indenting level. If the indenting level is more than 4 or 5 levels you may need to think about factoring out code.

Justification

  • A tab is a single character. While file size may not be the issue it once was, it is still wasteful to use four (or eight!) characters when a single character will do. When most lines in a script are indented at least once, this waste can quickly add up and become significant.
  • Most editors can set how many "spaces" are displayed for each tab, so you can set your editor up to display tabs however you like (2 spaces, 4 spaces, whatever), and the other programmers on the team may do the same.
  • If you need to convert a script to use spaces for indentation, it is a simple matter to replace each of the tabs with four spaces, or six spaces, or whatever is needed (the Pear project, for example, demands four spaces for each level of indentation).
  • As much as people would like to limit the maximum indentation levels, it never seems to work in general. We'll trust that programmers will choose wisely how deeply to nest code.

Example

function func()
{
 if (something bad)
 {
  if (another thing bad)
  {
   while (more input)
   {
    ...
   }
  }
 }
}
 

Line endings: 

The three major operating systems (Unix, Windows, and Mac OS) use different ways to represent the end of a line. Unix systems use the newline character (\n), Mac systems use a carriage return (\r), and Windows systems are terribly wasteful in that they use a carriage return followed by a line feed (\r\n). Ensure that your editor is saving files in the UNIX format. This means lines are terminated with a linefeed (\n), not with a carriage return/linefeed (\r\n) as they are on Win32, or a carriage return (\r) as they are on the Mac. Any decent editor (such as Notepad++) is able to do this, but it might not be the default. If you develop on Windows (and many people do), either set up your editor to save files in Unix format or run a utility that converts between the two file formats.

Make names fit

Names are the heart of programming. In the past people believed knowing someone's true name gave them magical power over that person. If you can think up the true name for something, you give yourself and the people coming after power over the code.
A name is the result of a long deep thought process about the ecology it lives in. Only a programmer who understands the system as a whole can create a name that "fits" with the system. If the name is appropriate everything fits together naturally, relationships are clear, meaning is derivable, and reasoning from common human expectations works as expected.
If you find all your names could be Thing and DoIt then you should probably revisit your design.

Hungarian notation

Hungarian notation is the practice of embedding metadata about a variable into the variable's name. For example, a variable holding a long integer might be prepended with "l" (lower case L), while an unsigned 32-bit integer might be prepended with "u32". There are several problems with using this type of notation. First, PHP is an untyped language, so "type" is not relevant. Second, Hungarian notation can quickly render a variable name into an unrecognizable mess (Wikipedia uses "a_crszkvc30LastNameCol" as an example: a constant reference argument, holding the contents of a database column LastName of type varchar(30) which is part of the table's primary key).
Having variables which are not human-readable will lead to errors when the code is revised or maintained. Do not use Hungarian notation.

Justification

  • Hungarian notation is directly counter to encapsulation, one of the basic principles of good programming.
  • Hungarian notation is essentially a method of encoding comments into a variable name. There are two reasons this is a bad idea. First, it is redundant: the proper place to comment on the type and usage of a variable in when the variable is declared. Second, it leads to having incorrect information embedded in the application. What happens when the requirements change, and $i_postal_code must support not only US zip codes (which are numbers), but also Canadian postal codes (which have letters)? Now $i_postal_code is a string. Will the programmer go through every file and change the name of this variable everywhere it appears? When deadlines are looming, tasks like this are postponed... and deadlines are always looming.
  • Hungarian notation is the Weapon Of Mass Destruction ™ of obfuscation techniques, and code obfuscation is both counterproductive and immoral.

Class names:

Class names should be nouns, in mixed case with the first ketter of each internal word capitalized. Try to keep your class names simple and descriptive.
  • Use upper case letters as word separators, lower case for the rest of a word
  • First character in a name is upper case
  • No underscores ('_').
  • Name the class after what it is. If you can't think of what it is, that is a clue you have not thought through the design well enough.
  • Compound names of over three words are a clue your design may be confusing various entities in your system. Revisit your design. Try a CRC card session to see if your objects have more responsibilities than they should.
  • Avoid the temptation of bringing the name of the class a class derives from into the derived class's name. A class should stand on its own. It doesn't matter what it derives from.
  • Suffixes are sometimes helpful. For example, if your system uses agents then naming something DownloadAgent conveys real information.

Example

class NameOneTwo
class Name

Class library names

Now that name spaces are becoming more widely implemented, name spaces should be used to prevent class name conflicts among libraries from different vendors and groups. When not using name spaces, it's common to prevent class name clashes by prefixing class names with a unique string. Two characters is often sufficient, but a longer length is fine. For example, the xTS project used "Xts" (don't let the shift in capitalization bother you: see Class names).

Example

Jo Johanssen's data structure library could use JJ as a prefix, so classes could be:
class JjNameOneTwo
{
} 
 

Method and function names

Methods should be verbs, in mixed case with the first letter lowercase, with the first letter of each internal word capitalized. Most methods and functions performs actions, so the name should make clear what it does, as consisely as possible: checkForErrors() instead of errorCheck(), dumpDataToFile() instead of dumpDataFileToDiskAfterAParticularlyHorrendousCrash(). This will also make functions and data objects more distinguishable.
  • Suffixes are sometimes useful:
    • Max - to mean the maximum value something can have.
    • Cnt - the current count of a running count variable.
    • Key - key value.
    For example: RetryMax to mean the maximum number of retries, RetryCnt to mean the current retry count.
  • Prefixes are sometimes useful:
    • Is - to ask a question about something. Whenever someone sees Is they will know it's a question.
    • Get - get a value.
    • Set - set a value.
    For example: IsHitRetryLimit.

Example

class JjNameOneTwo
{
 function setSomething()
 {
  ...
 }
 function handleError()
 {
  ...
 }
}

Variable names

Variable names should be all lowercase, with words separated by underscores. For example, $current_user is correct, but $currentuser, $currentUser and $CurrentUser are not. Variable names should be short yet meaningful. The choice of a variable name should indicate to the casual observer the intent of its use. One-character variable names should be avoided except for temporary variables and loop indices. Common names for temporary variables are i, j, k, m, and n for integers; c, d, e for strings.
  • use all lower case letters
  • use '_' as the word separator
  • do not use 'l' (lowercase 'L') as a temporary variable
  • do not use '-' as the word separator

Justification

  • This allows a variable name to have the same name as a database column name, which is a very common practice in PHP.
  • 'l' (lowercase 'L') is easily confused with 1 (the number 'one')
  • if '-' is used as a word separator, it will generate warnings used with magic quotes.

Example

function HandleError($error_number)
{
 $error = new OsError;
 $time_of_error = $error->getTimeOfError();
 $error_processor = $error->getErrorProcessor($error_number);
}

Example

$myarr['foo_bar'] = 'Hello';
print "$myarr[foo_bar] world"; // will output: Hello world
$myarr['foo-bar'] = 'Hello';
print "$myarr[foo-bar] world"; // warning message

Method argument names

Since function arguments are just variables used in a specific context, they should follow the same guidelines as variable names. It should be possible to tell the purpose of a method just by looking at the first line, e.g. getUserData($username). By examination, you can make a good guess that this function gets the user data of a user with the username passed in the $username argument. Method arguments should be separated by spaces, both when the function is defined and when it is called. However, there should not be any spaces between the arguments and the opening/closing parentheses.

Example

class NameOneTwo
{
 function startYourEngines(&$some_engine, &$another_engine)
 {
  $this->some_engine = $some_engine;
  $this->another_engine = $another_engine;
 }
 var $some_engine;
 var $another_engine;
}

Example

get_user_data( $username, $password ); // NOT correct: spaces next to parentheses
get_user_data($username,$password); // NOT correct: no spaces between arguments
get_user_data($a, $b); // ambiguous: what do variables $a and $b hold?
get_user_data($username, $password); // correct

Array elements

Since array elements are just variables used in a specific context, they should follow the same guidelines as variable names.
  • Access an array's elements with single or double quotes.
  • Don't use quotes within magic quotes

Justification

  • Some PHP configurations will output warnings if arrays are used without quotes except when used within magic quotes

Example

$myarr['foo_bar'] = 'Hello';
$element_name = 'foo_bar';
print "$myarr[foo_bar] world"; // will output: Hello world
print "$myarr[$element_name] world"; // will output: Hello world
print "$myarr['$element_name'] world"; // parse error
print "$myarr["$element_name"] world"; // parse error

Constant (define) names

Constants should be all uppercase with words separated by underscores ('_').
  • use all lower case letters
  • use '_' as the word separator

Justification

It's tradition for global constants to named this way. You must be careful to not conflict with other predefined globals.
This capitalization method for constant values (particularly for language-specific constants) provides the greatest amount of flexibility.

Example

define("A_CONSTANT", "Hello world!");

Contant values

  • Text fragments should have the first letter capitalized, with the rest in lower case. This allows capitalization of text fragments to be handled with a style, based on how that sentence fragment is used. See also: http://www.w3.org/TR/REC-CSS2/text.html#propdef-text-transform, http://www.devguru.com/Technologies/css/quickref/css_texttransform.html.
  • Complete sentences should be punctuated as sentences. This means using capitalizing the first word and including the end punctuation. Note: a text fragment with a colon at the end is not a complete sentence.
  • Text fragments should not have punctuation at the end unless it is a question, in which case the use of a question mark is acceptable (because this actually makes the text fragment a complete sentence with an implied subject/predicate/whatever).
  • It is very easy to overuse colons and exclamation points, and having them as part of the constant limits that constant's reusability, so these should not be included unless it is absolutely necessary. It rarely is.

Example

define("TITLE_CONSTANT", "User profile administration");
define("ERROR_CONSTANT", "This is an error message.")
 
Vtiger Coding Guidelines: 
http://wiki.vtiger.com/index.php/CodingGuidelines