Pattern matching in JavaScript

Pat­tern match­ing is a form of con­di­tional branch­ing which al­lows you to con­cisely match on data struc­ture pat­terns and bind vari­ables at the same time (Wikipedia, Con­di­tional State­ments, Pat­tern Match­ing.) Pat­tern match­ing is sup­ported in some func­tional lan­guages such as ML, Haskell, OCaml, and Er­lang. The fac­to­r­ial func­tion can be im­ple­mented in ML like so (note that is not an ef­fi­cient way to cal­cu­late fac­to­ri­als, but it serves as a good in­tro­duc­tion to pat­tern match­ing):

fun fact 0 = 1
  | fact n = n * fact(n - 1);

Pat­tern match­ing might seem like an es­o­teric func­tional pro­gram­ming fea­ture, but any­one who has writ­ten an XSLT stylesheet has used pat­tern match­ing be­fore, through the <xsl:template match="mynode"> el­e­ment. If you look hard enough how­ever, you’ll no­tice XSLT is ba­si­cally a thinly dis­guised func­tional (de­clar­a­tive) pro­gram­ming lan­guage. Not a pretty one though.

When you call fact(10) the value 10 is first matched against the first pat­tern 0. This match fails and the next pat­tern is eval­u­ated. The n in the next pat­tern is an ex­am­ple of a vari­able. Match­ing an in­te­ger against a vari­able suc­ceeds, so the sys­tem binds 10 to the vari­able n. The state­ments fol­low­ing the pat­tern are then ex­e­cuted. Since this is a re­cur­sive func­tion it will match the sec­ond pat­tern un­til the ar­gu­ment to the func­tion reaches zero and then ter­mi­nates. Haskell, OCaml, and Er­lang have sim­i­lar syn­tax for pat­tern match­ing. In this ar­ti­cle we will in­tro­duce pat­tern match­ing to the JavaScript lan­guage us­ing the JU­nify uni­fi­ca­tion li­brary.

Implementation

At its core, pat­tern match­ing is sim­ply a con­di­tional branch with pat­tern match­ing as the ex­pres­sions. Be­low is a sim­ple im­ple­men­ta­tion of the fac­to­r­ial func­tion in JavaScript, us­ing the JU­nify li­brary.

var $ = unification.variable;
var unify = unification.unify;

var fact = function (n) {
  var r;

  if (unify(0, n)) {
    return 1;
  }
  else if (r = unify($('n'), n)) {
    return r.n * fact(r.n - 1);
  }
};

fact(10); // 3628800

We can im­prove the syn­tax a bit by cre­at­ing a mod­ule that does the match­ing for us and ex­e­cutes a clo­sure when the match is made. The match mod­ule re­turns a func­tion that takes two pa­ra­me­ters: an ar­ray of pat­terns and clo­sures, and a value to match against. The ar­ray of pat­terns and clo­sures is it­self an ar­ray con­tain­ing the pat­tern to match on and a clo­sure to ex­e­cute once the pat­tern is matched. If no match is made the mod­ule re­turns un­de­fined. If how­ever a match is made, the clo­sure is ex­e­cuted and passed the re­sult of the uni­fi­ca­tion (the vari­able bind­ings, if any).

match = function () {
  var unify = unification.unify;

  function match_aux(patterns, value) {
    var i, result;

    for (i = 0; i < patterns.length; i += 1) {
      result = unify(patterns[i][0], value);
      if (result) {
        return patterns[i][1](result);
      }
    }
    return undefined;
  }

  return function(patterns, value) {
    return match_aux(patterns, value);
  };
}();

var fact = function (n) {
  return match([
    [0, function() { return 1; }],
    [$('n'), function(result) {
      return result.n * fact(result.n - 1);
     }]
  ], n);
};

The above code list­ing shows the fac­to­r­ial func­tion us­ing the match mod­ule. No­tice how­ever that we limit the num­ber of ar­gu­ments we can match on to one. This also means that func­tions us­ing the match mod­ule are lim­ited to only one ar­gu­ment. If they need more than one ar­gu­ment they would have to con­struct an ob­ject or ar­ray and pass that to the match method. We can make this a bit more flex­i­ble by al­low­ing an ar­bi­trary num­ber of ar­gu­ments and re­serve the last item in the match ar­ray for the clo­sure. We only need to change the match_aux func­tion to ac­com­plish this. In­stead of pass­ing the first value of the match ar­ray to unify we pass an ar­ray of length - 1 to it.

We also mod­ify the fac­to­r­ial func­tion to pass its ar­gu­ments ar­ray as the sec­ond ar­gu­ment (value) to the match mod­ule. The ar­gu­ments pseudo-ar­ray is turned into a real ar­ray by the Array.prototype.slice method be­fore be­ing passed to the match mod­ule.

match = function () {
  var unify = unification.unify;

  function match_aux(patterns, value) {
    var i, result, length;

    for (i = 0; i < patterns.length; i += 1) {
      length = patterns[i].length;

      // we only try to match if the match array contains at
      // least two items and the last item is a function
      if (length >= 2 &&
        typeof patterns[i][length - 1] === 'function') {
        result = unify(patterns[i].slice(0, length - 1),
                 value);
        if (result) {
          return patterns[i][length - 1](result);
        }
      }
    }
    return undefined;
  }
  …
}();

var fact = function () {
  return match([
    [0, function() { return 1; }],
    [$('n'), function(result) {
      return result.n * fact(result.n - 1);
     }]
  ], Array.prototype.slice.apply(arguments));
};

The new fac­to­r­ial func­tion is how­ever longer than the pre­vi­ous ver­sion. We can im­prove it by re­mov­ing the func­tion lit­eral and man­u­ally pass­ing the ar­gu­ments of the func­tion. We do this by wrap­ping the re­turned clo­sure in an­other clo­sure and pass­ing the in­ner clo­sure’s ar­gu­ments to the match func­tion.

match = function () {
  …
  return function(patterns) {
    return function () {
      return match_aux(patterns,
          Array.prototype.slice.apply(arguments));
    };
  };
}();

The match mod­ule can now be called with only one ar­gu­ment. This re­turns a clo­sure (the func­tion we’re build­ing) whose ar­gu­ments, to­gether with the pat­terns, are then passed to the match_aux func­tion. Again, the ar­gu­ments pseudo-ar­ray is turned into a real ar­ray. This small change sim­pli­fies our fac­to­r­ial func­tion tremen­dously.

var fact = match([
  [0, function() { return 1; }],
  [$('n'), function(result) {
    return result.n * fact(result.n - 1);
   }]
]);

No­tice that the ar­ray brack­ets around the match ar­rays are now re­dun­dant, since the ar­gu­ments to a func­tion are al­ready an ar­ray (al­beit a pseudo-ar­ray, but that can be fixed.) So let’s get rid of those too. We mod­ify the re­turn state­ment to pass the ar­gu­ments ar­ray of the outer clo­sure as the first value of the in­ner clo­sure. We have to in­tro­duce an ex­tra vari­able here be­cause the ar­gu­ments vari­able is lo­cal to a func­tion. By sav­ing it as a vari­able we can ac­cess the ar­gu­ments of the outer clo­sure from in­side the in­ner clo­sure.

match = function () {
  var slice = Array.prototype.slice;
  …
  return function () {
    var args = slice.apply(arguments);
    return function () {
      return match_aux(args, slice.apply(arguments));
    };
  };
}();


var fact = match(
  [0, function() { return 1; }],
  [$('n'), function(r) { return r.n * fact(r.n - 1); }]
);

Conclusion

This con­cludes our im­ple­men­ta­tion of ba­sic pat­tern match­ing in JavaScript. In the next ar­ti­cle we’ll dis­cuss more ad­vanced pat­tern match­ing tech­niques and sim­plify the syn­tax even fur­ther. You can down­load the fi­nal match.js code for your con­ve­nience.

We can also use JU­ni­fy’s sup­port for types in our pat­tern matcher to — for ex­am­ple — im­ple­ment DOM tra­ver­sal. The fol­low­ing func­tion tra­verses the DOM and re­turns the num­ber of char­ac­ters in the el­e­ment given to it as a pa­ra­me­ter.

var count_characters = match(
  [$('node', Text), function(result) {
    return result.node.length;
  }],
  [$('node'), function(result) {
    var count = 0;
    var n = result.node.firstChild;

    while (n) {
      count += count_characters(n);
      n = n.nextSibling;
    }
    return count;
  }]
);

// display the number of characters in the document body
alert(count_characters(document.body));

It does this by first match­ing on the Text type and re­turn­ing the num­ber of char­ac­ters. The other match pat­tern matches any other type of node in the DOM (e.g. Element) and it­er­ates through its chil­dren and calls the count_characters func­tion re­cur­sively.