hyperexponential logo

A Regex Saved Me Enough Time To Write A Blog Post About It

2022-04-26Written by Tom Steavenson

When you find multiple occurrences of the same error in your code, the thought of manually correcting every instance is daunting, this is where a regular expression (regex) saves me time.

So today I’ve been adding Open API Spec linting to our CI pipeline and fixing any errors I found... What could possibly go wrong? 😊

After shopping around through a few linters (there are a few), I settled on Spectral as a linter that gives you more than one error at once, line numbers, and the ability to endlessly tweak linting config if there were any “special exemptions” we need.

So I had a go at running it locally:

❌ 607 problems (375 errors, 232 warnings 0 infos, 0 hints)

whoops... 😬

OK, so it turns out that 300 or so of these errors are due to multiple instances of the same issue.

2781:9    error  parser
  Mapping key must be a string scalar rather than number

In English: YAML type error. Where we have something like this:

    description: OK

We want it to be this instead:

    description: OK

So the key is read as a string.

This still leaves me with one issue. I need to go and make this change 300 times... 🤔

Not to fear, sed is here.

sed -i 's/\([0-9]\{3,3\}\):/"\1":/g' api-spec-v2.tmpl.yml


In English:

Make these changes in place, sed -i , to file api-spec-v2.tmpl.yml

Wherever you find a 3-digit number followed by a colon, put quotes around that number.

But how?

The leading s/ tells sed to perform a substitution, and the final /g instructs sed to “do it everywhere”.

The regex expression is of the form s/<matching regex>/<substitution>/g.

Looking at the matching regex:

[0-9] matches digits (characters between 0 & 9). \{3,3\} says to match digits a minimum of 3 and a maximum of 3 times, a 3-digit number. The \( , \) that surrounds this tells sed that the match inside is a “group” (I’ll come back to that).

Then the following : states that a 3-digit number must be followed by a colon.

Looking at the substitution:

We’ve got the quotes and the colon, and in between the quotes \1 says to copy in the first match group (the thing surrounded by \( & \)).

Of course, this matching regex isn’t all that robust... Something like response_was_200: true might suddenly find itself getting turned into response_was_"200": true, which is no good. Before trying to develop the regex any further, we can test what matches it has in the file.

grep '\([0-9]\{3,3\}\):' api-spec-v2.tmpl.yml

The output was far too verbose to paste in here, but take it from me, the only matches were ones that we cared to change!

So there we go! 300 errors fixed in a few seconds, and... A Regex Saved Me Enough Time To Write A Blog Post About It.

Interested in finding out more about hx?

hyperexponential logo