Optimizing the Performance of a Node.js Function

Introduction

After letting it stagnate for awhile, I decided to rework Street.js to use the things I have been working with in Node.js this last year.  My main goals are as follows:

  • ES6ify the code base
  • Replace nasty callback code with Promises
  • Pass ESLint using JavaScript Standard Style config
  • Annotate types with Flow
  • Simplify the implementation

Node.js v4 has a lot of new ES6 features that are extremely helpful, fun, and performant.  I will be refactoring Street to use these new features.  Of course, like a good semver citizen, I will update my major version (to v1.0) when I publish the rewrite.

Setup

I am using Babel as a transpiler to paper over ES6 features not yet in Node.js, but blacklisting the transforms that are already present.  Many ES6 features (e.g. generators, symbols, maps, sets, arrow functions) are more performant natively than via transpilation and I do not care about supporting Node.js before version 4.  The following is my .babelrc configuration file showing the blacklist I am using:

{
  "blacklist": [
    "es3.memberExpressionLiterals",
    "es3.propertyLiterals",
    "es5.properties.mutators",
    "es6.blockScoping",
    "es6.classes",
    "es6.constants",
    "es6.arrowFunctions",
    "es6.spec.symbols",
    "es6.templateLiterals",
    "es6.literals",
    "regenerator"
  ],
  "optional": [
    "asyncToGenerator"
  ]
}

Case Study: Walking a Directory

In Street, I need to walk a directory of files so I can generate a manifest of file paths and their hashes for comparison to a previous manifest.  The directory walking code was hairy; most of it was from Stack Overflow.  Here’s the current state (cleaned up):

var fs = require('fs')
var path = require('path')

function oldFindFilePaths (dir, done) {
  var filePaths = []
  fs.readdir(dir, function(err, filelist) {
    if (err) return done(err)
    var i = 0

    ;(function next() {
      var file = filelist[i++]
      if (!file) return done(null, filePaths)

      file = path.join(dir, file)

      fs.stat(file, function(err, stat) {
        if (err) return done(err)
        if (stat && stat.isDirectory()) {
          _findFilePaths(file, function(err, res) {
            if (err) return done(err)
            filePaths = filePaths.concat(res)
            next()
          })
        } else {
          filePaths.push(file)
          next()
        }
      })
    })()
  })
}

I never really liked this because it is not intuitive to me. The function is an unwieldy set of multiple recursive calls that make me feel gross.  Once I got it working, I was wary of touching it again.

There must be a better way to do this! I can either spend some time refactoring this to make it nicer, or see if a rewrite is more elegant and perhaps performant. I am willing to take a small performance hit.

The following is my first iteration:

import fs from 'fs'
import path from 'path'

async function findFilePaths (dir: string): Promise<Array> {
  var foundPaths = []
  var files = fs.readdirSync(dir)

  while (files.length > 0) {
    let file = files.pop()
    if (!file) break

    let filePath = path.join(dir, file)
    let stat = fs.statSync(filePath)

    if (stat.isDirectory()) {
      foundPaths = foundPaths.concat(await findFilePaths(filePath))
    } else {
      foundPaths.push(filePath)
    }
  }

  return foundPaths
}

Do not be thrown off by the Type Annotations.  I really enjoy FlowType and find it useful for finding many kinds of bugs.  All those annotations get stripped during babel transpilation.

This function was much clearer. I love ES7 Async functions. They wrap a function’s logic in a Promise and then resolve with the returned value or reject if errors are thrown. Inside an Async Function, you can await on Promises. If they resolve, the value resolved with is returned. If they reject, the value (best if an error instance) rejected with is thrown.

Notice that I replaced my asynchronous fs calls with synchronous ones. The callbacks were just too nasty, and since this is a CLI application they were not that helpful for performance as I was using them.

This was much clearer, but still not ideal to me. I am not a fan of while loops and instead prefer a more functional approach using map, filter, and reduce when possible. Also, calling fs.readdirSync was ok in this usage, but those fs.statSync calls seemed inefficient as they would block on each call to a file descriptor. Perhaps I could make them async again but parallelize them.

This lead me to my next iteration:

async function newFindFilePaths2 (dir: string): Promise<Array<string>> {
  var files = await new Promise((resolve, reject) =>; {
    fs.readdir(dir, (err, files) =>; err ? reject(err) : resolve(files))
  })

  var statResultPromises = files.map(file =>; new Promise((resolve, reject) => {
    var filepath = path.join(dir, file)
    fs.stat(filepath, (err, stat) =>; err ? reject(err) : resolve({filepath, stat}))
  }))

  var results = await Promise.all(statResultPromises)
  var {subDirs, foundPaths} = results.reduce((memo, result) =>; {
    if (result.stat.isDirectory()) {
      memo.subDirs.push(result.filepath)
    } else {
      memo.foundPaths.push(result.filepath)
    }
    return memo
  }, {subDirs: [], foundPaths: []})

  var subDirPaths = await Promise.all(subDirs.map(findFilePaths2))
  return foundPaths.concat(...subDirPaths)
}

Notice the while loop is gone; replaced with map and reducefs.stat happen in parallel for a list of files. The fs.readdir call is also async because I will do recursive calls to this function in parallel for all subdirectories I find.

I am also a fan of destructuring and spreading. It makes for more concise and elegant code. My favorite example here is taking the results of recursive calls to findFilePaths2, which are arrays of strings, and then spreading them into arguments to the foundPaths.concat function call to join all the paths into a single array.

This is excellent, but can be cleaned up and broken into a few different functions. This brings me to my last iteration:

function listFiles (dir) {
  return new Promise((resolve, reject) => {
    fs.readdir(dir,
               (err, files) => err ? reject(err) : resolve(files))
  })
}

function getStatMapFn (dir) {
  return file => new Promise((resolve, reject) => {
    var filepath = path.join(dir, file)
    fs.stat(filepath,
            (err, stat) => err ? reject(err) : resolve({filepath, stat}))
  })
}

function partitionByType (memo, result) {
  if (result.stat.isDirectory()) {
    memo.subDirs.push(result.filepath)
  } else {
    memo.foundPaths.push(result.filepath)
  }
  return memo
}

async function newFindFilePaths3 (dir: string): Promise<Array<string>> {
  var files = await listFiles(dir)
  var results = await Promise.all(files.map(getStatMapFn(dir)))
  var {subDirs, foundPaths} = results.reduce(partitionByType,
                                             {subDirs: [], foundPaths: []})

  var subDirPaths = await Promise.all(subDirs.map(findFilePaths3))
  return foundPaths.concat(...subDirPaths)
}

Even though it is more lines of code, I prefer this to the previous. A few pure, helper functions and one function that puts them all together concisely and elegantly. So beautiful!

Running Times Compared

Lets check the execution time to see if we did any better.  This is just meant as a dirty comparison, not super scientific.

Function Execution Time (11 files, 2 dirs)
oldFindFilePaths  (callback hell) 1.8 ms
newFindFilePaths  (while loop) 12.1 ms
newFindFilePaths2  (map/reduce) 13.3 ms
newFindFilePaths3 (final map/reduce) 13.4 ms

Darn! The old function appears to be the most performant with a small number of files.  The difference between my last two iterations is negligible which makes sense because they are really the same thing just refactored slightly.

But what happens when there are more files and subdirectories?

Function Execution Time (11 files, 2 dirs) Execution Time (1300 files, 200 dirs) Execution Time (10800 files, 2400 dirs)
oldFindFilePaths  (callback hell) 1.8 ms 41.8 ms 269.6 ms
newFindFilePaths  (while loop) 12.1 ms 36.9 ms 182.6 ms
newFindFilePaths2  (map/reduce) 13.3 ms 60.8 ms 413.8 ms
newFindFilePaths3 (final map/reduce) 13.4 ms 61.5 ms 416.1 ms

Interesting!  The synchronous while loop started beating all the cases once files started to be in the 1000s spread over 100s of subdirectories.

Conclusion

I think I will probably end up going with the while loop function because it is the simplest and has better performance at scale.  And in the end, I mainly just wanted something with a simple API that I could hide behind a promise.

My theory of its superior performance over large directories is that the synchronous file system queries act like a kind of naive throttling system; it stops the VM from making thousands of concurrent function calls and file system queries which would bog it down.  That’s just my intuition though.

Optimizing the Performance of a Node.js Function

Header Files, Compilers, and Static Type Checks

Have you ever thought to yourself, “why does C++ have header files”?  I had never thought about it much until recently and decided to do some research into why some languages (C, C++, Objective C etc.) use header files but other languages do not (e.g. C# and Java).

Header files, in case you do not have much experience with them, are where you put declarations and definitions.  You declare constants, function signatures, type definitions (like structs) etc.  In C, all these declarations go into a .h file and then you put the implementation of your functions in .c files.

Here’s an example of a header file called mainproj.h:

#ifndef MAINPROJ_H__
#define MAINPROJ_H__

extern const char const *one_hit_wonder;

void MyFN( int left, int back, int right );

Here is a corresponding source file mainproj.c:

#include "mainproj.h"

const char const *one_hit_wonder = "Yazz";

void MyFN( int left, int back, int right )
{
    printf( "The only way is up, baby\n" );
}

Notice that the header only has the function definition for MyFN and it also does not specify what one_hit_wonder is set to. But why do we do this in C but not in Java?  Both are compiled and statically typed.  Ask GOOGLE!

A great MSDN blog post by Eric Lippert called “How Many Passes” was very helpful.  The main idea I got out of the article is that header files are necessary because of Static Typing.  To enforce type checks, the compiler needs to know things like function signatures to guarantee functions never get called with the wrong argument types.

Eric lists two reasons for header files:

  1. Compilers can be designed to do a single pass over the source code instead of multiple passes.
  2. Programmers can compile a single source file instead of all the files.

Single Pass Compilation

In a language like C#, which is statically typed but has no header files, the compiler needs to run over all the source code once to collect declarations and function signatures and then a second time to actually compile the function bodies (where all the real work of a program happens) using the declarations it knows about to do type checks.

It makes sense to me that C and C++ would have header files because they are quite old languages and the CPU and Memory resources required to do multiple passes in this way would be very expensive on computers of that era.  Nowadays, computers have more resources and the process is less of a problem.

Single file compilation

One interesting other benefit of header files though is that a programmer can compile a single file.  Java and C# can not do that: compilation occurs at the project level, not the file level.  So if a single file is changed, all files must be re-compiled.  That makes sense because the compiler needs to check every file in order to get the declarations.  In languages with header files, you can only compile the file that changed because you have header files to guarantee type checks between files.

Relevance Today

Interesting as this may be, is it relevant today if you only do Java, C#, or a dynamic language?  Actually it does!

For instance, consider TypeScript and Flow which both bring gradual typing to JavaScript. Both systems have a concept of Declaration files.  What do they do?  You guessed it!  Type declarations, function signatures, etc.

TypeScript Declaration file:

module Zoo {
  function fooFn(bar: string): void;
}

Flow Declaration file:

declare module Zoo {
  declare function fooFn(bar: string): void;
}

To me, these look an awful lot like header files!

As we see, header files are not dead!  They are alive and well in many strategies for Type Checking.

Header Files, Compilers, and Static Type Checks

Why you should be using Fig and Docker

This is a introductory article to convince and prepare you try setting up your web app development environment with Fig and Docker.

The snowflake Problem

Let me take a moment to lay some foundation by rambling about dev environments.  They take weeks to build, seconds to destroy, and a lifetime to organize.  As you configure your machine to handle all the projects you deal with, it becomes a unique snowflake and increasingly difficult to duplicate (short of full image backups).  The worst part is that as you take on more projects, you configure your laptop more, and it becomes more costly to replace.

I develop on Linux and Mac and primarily do web development.  Websites have the worst effect on your dev environment because they often (read always) need to connect a number of other services like databases, background queues, caching services, web servers, etc.  At any given moment, I probably have half a dozen of those services running on my local machine to test things.  It is worse when I am working on Linux, because it is so easy to locally install all the services an app runs in production.  I routinely have MongoDB, PostgreSQL, MySQL (MariaDB), Nginx, and Redis running on my machine.  And lets not even talk about all the python virtualenv’s or vendorized Rails projects I have lying around my file system.

Docker Steps In

Docker is an such an intriguing tool.  If you have not heard, Docker builds on the ideas of Linux container features (cgroups and namespace isolation) to create lightweight images capable of running processes completely isolated from the host system.  It is similar to running a VM, but much smaller and faster.  Instead of emulating hardware virtually, you access the host system’s hardware.  Instead of running an entire OS virtually, you run a single process.  The concept has many potential use cases.

But with Docker, you can start and stop processes easily without needing to clutter your machine with any of that drama.  You can have one Docker image that runs Postgres and another that runs Nginx without having them really installed on your host.  You can even have multiple language runtimes of different versions and with different dependencies.  For example, several python apps running different versions of Django on different or the same versions of CPython.  Another interesting side effect, if you have multiple apps using the same kind of database, their data will not be on the same running instance of your database.  The databases, like the processes, are isolated.

Docker images are created with Dockerfiles.  They are simple text files that start from some base image and build up the environment necessary to run the process you want.  The following is a simple Dockerfile that I use on a small Django site:

FROM python:3.4
MAINTAINER tgroshon@gmail.com

ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
ADD requirements.txt /code/
RUN pip install -r requirements.txt
ADD . /code/

Simple right?  For popular platforms like Python, Ruby, and Node.js, prebuilt Docker images already exist.  The first line of my Dockerfile specifies that it builds on the python version 3.4 image.  Everything else after that is configuring the environment.  You could even start with a basic Ubuntu image and apt-get all the things:

FROM ubuntu:14.04

# Install.
RUN \
  apt-get update && \
  apt-get -y upgrade && \
  apt-get install -y build-essential && \
  apt-get install -y software-properties-common && \
  apt-get install -y byobu curl git htop man unzip vim wget

From there you can build virtually any system you want. Just remember, the container only runs a single process. If you want to run more than one process, you will need to install and run some kind of manager like upstart, supervisord, or systemd. Personally, I do not think that is a good idea.  It is better to have a container do a single job and then compose multiple containers together.

Enter Fig

Problem is, Docker requires quite a bit of know-how to get configured in this kind of useful way.  So, let’s talk about Fig.  It is created specifically to use Docker to handle the Dev Environment use case.  The idea is to specify what Docker images your app uses and how they connect.  Then, once you build the images, you can start and stop them together at your leisure with simple commands.

You configure Fig with a simple yaml file that looks like this for a python application:

web:
  build: .
  command: python app.py
  links:
   - db
  ports:
   - "8000:8000"
db:
  image: postgres

This simple configuration specifies two Docker containers: a Postgres container called db and a custom container built from a Dockerfile in the directory specified by the web.build key (current directory in this case).  Normally, a Dockerfile will end with the command (CMD) that should run it.  The web.command is another way to specify that command.  web.links is how you indicate that a process needs to be able to discover another one (the database in this example).  And web.ports simply maps from a host port to the container port so you can visit the running container in your browser.

Once you have the Dockerfile and fig.yml in your project directory, simply run fig up to start all of your containers and ctrl-c to stop them.  When they aren’t running, you can also remove them from fig by running fig rm although it seems to me that the docker images still exist, so you might also want to remove those for a completely clean install.

Conclusion

Once I learned about Docker and Fig, it is one of the first things I do on new web projects.  The initial configuration can take some time, but once you have it configured it pays for itself almost immediately.  Especially when you add other developers to a project.  All they need to have installed are Docker and Fig, and they are up and running with a simple fig up command.  Configure once, run everywhere.  Harness all that effort spent configuring your personal machine and channel it into something that benefits the whole team!

Why you should be using Fig and Docker

Why I did not like AngularJS in 2014

Edited Mar, 2015:  Previously titled “Why I Do Not Recommend Learning AngularJS”.  In retrospect, my arguments are superficial and likely apply to the specific situation I was in.  In addition, I was wrong that learning a new tech is wasteful.  Learning anything makes you better at learning and that is what we should all be trying to do.  Learn what you’re excited about!

tl;dr

Despite it’s good qualities, I did not enjoy learning AngularJS.  With all the available options of web frameworks (e.g. Ember, React, Backbone, etc.), Angular fell behind in the following three areas:

  1. Performance
  2. Complexity
  3. Non-transference of Skills

Introduction

A lot of people ask me what I think about AngularJS, so I wanted to take some time to collect my thoughts and try to explain it clearly and rationally.  The following is the result.

I would like to start by saying AngularJS has a lot of good qualities, or else not so many people would use it so happily.  It makes developers excited to do web development again and that is hugely important.

With that being said, I did not like learning AngularJS.  With all the available options of web frameworks (e.g. React, Ember, Backbone, etc.), Angular falls behind in the following three areas:

  1. Performance
  2. Complexity
  3. Non-transference of Skills

Performance

I normally do not like picking on performance flaws, especially when a conscious decision has been made to trade performance for productivity.  I can understand that trade-off.  I do Ruby on Rails 😉

However, Angular’s performance has such serious problems that it becomes almost unusable for certain features or whole applications.  The threshold of how much work you can make Angular do on a page before performance tanks is scary low!  Once you have a couple thousand watchers/bindings/directives doing work on a page, you notice the performance problems.  And it is not actually that hard to get that large amount of binding happening on the page.  Just have a long list or table with several components per row, each with healthy number of directives and scope bindings, and then add more elements as you scroll.  Sound like a familiar feature?

Again I’d like to say that performance is not that terrible of a problem to have, because new versions of a framework can (and almost always will) optimize around common, performance problems.  I do not think performance will be a long-term problem in Angular; but it is a problem right now.

Complexity

Of all the most popular front-end frameworks (Ember, React, and Backbone), Angular is the most complex.  Angular has the most new terms and concepts to learn in a JavaScript framework such as scopes, directives, providers, and dependency injection.  Each of these concepts are vital to effectively use Angular for any use case beyond trivial.

Ember is also quite complex, but the framework itself gives direction for project organization which mitigates some complexity.  Also Ember is better at mapping its concepts to commonly used paradigms which I will talk about in the next section.

With React, you can be productive after learning a few function calls (e.g. createClass() and renderComponent()), creating components with objects that implement a render() method, and setting your component state to trigger re-renders.  Once you wrap your head around what React is doing, it is all very simple.  My experience was after weeks with Ember and Angular, I still did not grok all the complexity or feel like a useful contributor to the project.  After a day with React, I was writing production quality UI with ease.

Non-transference of Skills

I have been a web developer for years now.  Not a lot of years, but a few.  My first dev job was in college building UI with jQuery, which I learned very well.  Then I remember my first job interview outside of school with a company that built web applications with vanilla JavaScript and no jQuery.  I got destroyed in the JavaScript portion of the interview because my jQuery knowledge mapped very poorly to vanilla JavaScript.  In fact, I would go so far to say that I knew next to nothing about JavaScript even after a year of extensive web development with jQuery.

Why didn’t my jQuery skills transfer?  Because my development with jQuery taught me a Domain Specific Language (DSL).  While DSL’s can improve productivity, knowledge of them will seldom transfer to other areas.  The reverse can also be true.  You could call this inbound and outbound transference.

Angular is like jQuery.  It has transference problems.  The most serious problem in my mind is that Angular suffers from both inbound and outbound transference problems.  Knowing JavaScript, MVC, or other frameworks was less helpful while learning Angular.  What I learned from doing Angular has not helped me learn other things.  But maybe that’s just me.

Conclusion

If you know Angular and are productive with it, great!  Use it.  Enjoy it.  Be productive with it.  I tried Angular, and it just didn’t do it for me.

If you are looking for a framework that is both scalable and flexible, look into React.  In my experience, it is the easiest to learn and plays the nicest with legacy code.  Iterating React into almost any project is quite easy.  Of all the frameworks, React is probably the easiest to get out of because all your application logic is in pure JavaScript instead of a DSL.  The strongest benefit I have seen when using React is the ability to reason about your app’s state and data flow.  If you want a high-performance and transferable application, I highly recommend React.

If you want the experience of a framework that does a lot for you, go for Ember.  It will arguably do more for you than even Angular.  As I have seen, the Ember team is also more responsible/devoted to supporting large-scale applications or corporate clients which require stability and longevity.  They are the clients who do not want to be rewriting their apps every other year.  The one drawback I have seen is that Ember prefers to control everything of your app and does not play nice with other technologies.  If you have substantial legacy code, Ember will be a problem.

AngularJS will be releasing 2.0 soon, and it will be completely different from Angular 1.x.  Controllers, Scopes, and Modules are all going away.   To me, that seems like realization by the Angular Core Team that some of those neo-logisms did not work out.

Why I did not like AngularJS in 2014

Caps Lock is Stupid

Caps Lock is stupid.  How often is it actually useful?  It capitalizes your letters, but you still have to use the shift key to make exclamation points, question marks, and most other symbols.  Might as well just hold down the shift key for everything.

So instead, change your Caps Lock to another input action.  On a Mac, go to Preferences -> Keyboard then click “Modifier Keys”.  Use the dropdown list of actions to change the “Caps Lock Key” to do something else.  I like turning it to “Control”.  This is particularly useful on Macbook keyboards when the “Fn” key is where the “Control” key should be.

You can always change it to something else.  Give it a shot.  Make the Caps Lock key work for you!

Caps Lock is Stupid

Site Migration

Some interesting stuff going on!  I recently ported my sites off of Hostgator.  I realized that it just did not make sense to keep paying $6.95 a month for what I was actually taking advantage of: one wordpress blog (this site) and one static site.  Not to mention, the particular shared hosting plan I was using limited me to one domain name associated with services/applications of that plan.  

No more!  I have now migrated the Codehabitude Blog to WordPress.com and setup the custom domain name Codehabitude.com (which is a cheap process at only $13 per year).  The transition was also very easy.  Wordpress has a pretty good export/import feature that brought over all my data really well.  I do not get all the same themes that were available to me before, but it has other benefits like stronger security with the WordPress.com 2-factor Authentication.  I also help my dad manage his WordPress site, which I also just ported over to WordPress.com, and it is so much easier to be an Administrator on multiple blogs hosted here than spread across different Hosting Plans.

My static website Tommygroshong.com is now hosted on Amazon S3.  I am now in my second month of hosting, and the cost has been between $0.50 and $1.00 and month.  The setup was not that hard; it just took some configuration of an S3 bucket and a Route 53 DNS Zone.  The most expensive thing is the Route 53 at $0.50/month.

My total cost per month hovers around $1.65 (S3 + Route 53 + WordPress Custom Domain) which is like a -76% change in monthly cost from $6.95.  On top of all that, I learned a lot from the deploy experience and both sites appear to perform better than before.

I plan on writing a more in-depth blog post about launching a Static Website on Amazon S3 including giving a tutorial on a new deploy tool I wrote in Node.js called Street.js that makes deploying and updating sites in S3 buckets incredibly simple.

Site Migration

Understand Agile by Understanding How It’s Not Waterfall

Introduction

Many engineers and managers have a really hard time with the Agile project management methodology (the “new way” of building things). They desperately want to have Agile projects, but implementing the methodology is a struggle. One way that I have found to be useful in understanding and implementing Agile is by doing a compare and contrast with the Waterfall methodology (the “old way” of building things).

However, most people do not understand the Waterfall methodology; the very thing that Agile is attempting to fix in order to make developing software more effective and manageable. If you do not understand what the ineffective way is, how can you successfully avoid it?

The Waterfall Straw Man

A common symptom of misunderstanding Waterfall is the ritual building and burning of the Waterfall Straw Man. If you work in software long enough, you will hear all about the Waterfall Straw Man. You’ll hear a collection of arguments about how some action/process/etc. is bad because it is “too waterfall”!

  • We can’t predict the end date; that’s waterfall.
  • We can’t plan out all the features or tasks; that’s waterfall.
  • We can’t have meetings; that’s waterfall.
  • We can’t make document; that’s waterfall.

Waterfall becomes the culmination of every ridiculous or bad practice we’ve experienced or heard about in a software project. It’s the boogeyman of managing software projects. On some teams even the accusation of “waterfall” is enough to kill any process.

The Facts about Waterfall

What is Waterfall really? The funny thing is that Waterfall is actually a very successful project management method throughout the world in manufacturing and construction. The following is the simplified waterfall approach:

  1. Identify a Project
  2. Plan the project
  3. Implement the Plan
  4. Deliver Tangible Output to the Customer

That’s it! That is waterfall! It actually has several benefits. One is that comprehensive planning up front allows effective resource planning. Also, discovering and fixing problems early on is less costly than dealing with them later. You might argue,

“Well obviously the problem is that Waterfall is not Customer centric.”

Wrong. Including the customer in the planning of a waterfall project is not uncommon. You might instead say something like the following,

“Well obviously the problem is that Waterfall doesn’t have Sprints/Standup Meetings/Backlogs/etc. !”

And you would be wrong. Waterfall can have any of those things. Those processes and components are popular in but not specific to Agile.

The Waterfall method successfully builds furniture, cars, bridges, and skyscrapers every day throughout the world. Sometimes, a customer is unhappy with the result but Waterfall gets it right many times.

The Case for Agile

So then, why use Agile if Waterfall is so great? Waterfall is very poor at managing change; which we have learned is exceptionally common in software development (nobody really knows why). A central tenet of Waterfall is that plans are made upfront, complete, and comprehensive. Changing a plan later is rare and exceptionally costly because a waterfall plan is so large scale.

You can read the Agile Manifesto to see what the creators intend Agile to be. Knowing their intent, I suggest these three, concrete improvements that Agile makes to Waterfall:

  1. Iterative Processes
  2. Frequent Delivery of Tangible Output

The Agile method would be applied in this manner:

    1. Identify a project.
    2. Plan what you know as best you can; Include the customer in the planning.
    3. Execute the project plan to deliver some tangible output to the customer quickly for validation.
    4. Discover new information from (a) your development and (b) the customer upon your delivery.

… Iterate over steps 2-4 until the desired output is delivered.

Wait! All that planning sounds an awful lot like Waterfall. Of course it does! Making plans was never the problem with Waterfall. Inability to change plans based on new information is the problem with Waterfall.

Scrum, Kanban, Xtreme Programming, and other Agile methodologies have a lot of additional behaviors and processes, but they are all built on this frame: (1) Iterative process and (2) frequent delivery of tangible output.

Conclusion

Do not fall into the trap of burning the Waterfall Straw Man. It benefits no one. In fact, I have seen some software teams implement Agile so poorly that they run into more problems than if they just implemented a straightforward Waterfall plan. At least then they would have planned before coding, interacted with the customer at least once, and worked to deliver what they promised in their plan. Better than nothing!

Agile is absolutely the way to go when building Software. It also works in other projects where change management is important (which is a surprising number of projects). Take the time to understand Project Management so that you can benefit from these proven practices.

Understand Agile by Understanding How It’s Not Waterfall