why I like regular expressions + who says they aren’t readable

Scott and I are working on our second book (OCP 8). I’m excited that I get to write the part about regular expressions as that is one of my favorite programming topics. Why, you ask? Because it lets you write clear and efficient code.

The scenario

For the book, I wrote an example showing how validating a simplified phone number is so much easier with a regular expression. The rules for the example were:

  1. a phone number is exactly 10 digits
  2. a phone number may contain dashes to separate the first three digits and next three digits, but not anywhere else
  3. no other characters are allowed (no parens around the area code in this example)

For example, 123-456-7890, 123-4567890 and 123456-7890 are valid. In real life, the third one wouldn’t be; we allow this typo here to be nice.  However dashes aren’t allowed in random positions. 12-45-67-890 is not a phone number.

Without regular expressions

This isn’t in the book, but i tired to write the code “the long way” to ensure it was annoying long. It was. I tried to write the code in a readable way and the best I could think of was:

private static boolean validateLong(String original) {
   String phone = original;
   // remove first dash (if present)
   if (phone.charAt(3) == '-') {
     phone = phone.substring(0, 3) + phone.substring(4);
   }
   // remove second dash (if present)
   if (phone.charAt(6) == '-') {
      phone = phone.substring(0, 6) + phone.substring(7);
   }

   // validate 10 characters left
   if (phone.length() != 10) {
      return false;
   }

   // validate only numbers left
   Set<Character> digits = new HashSet<>(Arrays.asList('0', '1', '2', '3', '4', '5', '6', '7', '8', '9'));
   for (int i = 0; i < phone.length(); i++) {
      if (!digits.contains(phone.charAt(i))) {
         return false;
      }
   }
   return true;
   }

This is a lot of code. And to those who think regular expressions are unreadable, what do you think of the above? I don’t find it easy to see what is going on even though I wrote it. There’s just too much logic and too much detail to ensure is correct. (And no, it didn’t work on my first attempt.)

With regular expressions

Re-writing to use regular expression gives me this:

private static boolean validate(String phone) {
   String threeDigits = "\\d{3}";
   String fourDigits = "\\d{4}";
   String optionalDash = "-?";
   String regEx = threeDigits + optionalDash + threeDigits + optionalDash + fourDigits;
   return phone.matches(regEx);
}

Even if you don’t know the regular expression syntax, it should be obvious what is going on here. We look for three digits, an optional dash, three more digits, another optional dash and a final four digits.

It’s a tiny bit longer in the book version because {3} isn’t on the exam so that part is:

String threeDigits = "\\d\\d\\d";
String fourDigits = "\\d\\d\\d\\d";

Still. Way easier to read and faster to write than the original code without regular expressions. I consider regular expressions like a hammer. They aren’t the right tool for every job, but they are quite helpful when they are the right tool.