Why should you care about regexp?

RegExp, a tech jargon that brings memories of character combinations like and despair. Is it worth it?

nsquar3d
4 min readMar 7, 2023

--

Almost everyone working in the tech industry has heard at least once about RegExp aka regular expressions and Regex either while studying or at work; few have the patience to learn about them and even fewer utilize them correctly and only when it is needed.

Some cursed regexp symbols — source

To be fair, that is expected not only from its steep learning curve and the abundance of possibilities, but also from its origin. In principle, a regular expression forms a regular language, essentially the set of all strings which can ‘fit’ in the said expression. Diving deeper into matters of Computation Theory, a regular language belongs to type 3 grammars: structures that can be expressed exactly using an appropriate nondeterministic finite automaton (NFA).

Of course, the above are of little to no importance to an engineer wanting to formulate a regexp for a specific use case. But they can still impose barriers in appearing easy to use or beginner friendly. Luckily, there are many sources for people wanting to get involved with regexp, even areas to test your own regexp against actual text, so that you can examine the matches you are getting.

For example, Regex101 is a great website for this purpose and it’s the origin of the example screenshot below.

Here we have a part of a json file and we want to find all lines where the key contains only letters and the value is a (positive/negative) number with only two decimals

Common use cases for regexp

There are far too many posts/guides for the usages of regexp that might mention some use cases like :

  1. Searching for patterns in log files with less, grep, vim etc.
  2. Fields’ validation that include email format checks, password policies enforcement, length requirements and non-numeric characters in fields.

Recently, I happened to use regexp at work in an unusual situation; of course, one could say that this case could also be categorized as one of the above, but hopefully it might give ideas for other interesting applications of regexp to the reader.

Using regexp to identify & fix bugs

A codebase, as large as it may be, it still is a collection of files, which, in turn, are a collection of characters. If a specific type of bug can be generalized and a global find in the codebase can be performed, then all bugs of the same type may be identified (and with a nicely written regexp replacement string they may be solved as well!).

This case revolves around some Java code snippets saved as CLOBs in a DB. These code snippets are mainly used as small — modular parts of calculations, so most of the time we need them to perform calculations as accurately as possible. Consequently, the BigDecimal class is heavily used, but the tricky part is how you construct your BigDecimal.

BigDecimal d = new BigDecimal(3.7); // 3.70000000000000017763568394002504646778106689453125
BigDecimal b = new BigDecimal("3.7"); // 3.7
Boolean areEqual = d.equals(b); // this is false!

The reason for the different results in the code above depends of course on the argument type of the constructor and is well defined in the documentation here and here. To make a long story short, it boils down to the inability of a binary numerical system to have an exact representation of a decimal fraction. As a result, you have to use 10 as a radix and that can only be accomplished by using the BigDecimal(String) constructor.

Unfortunately, this rule was not followed everywhere, resulting in the usage of other BigDecimal constructors as well, which in turn often produced precision errors. So the task was simple:

Luckily SQL enables us to find and replace using regexp; we were using Oracle DB, so I had to work with REGEXP_REPLACE (in reality its simple version, where you have source, pattern and replacement).

Essentially, I was looking for every place in the text that :

  1. May have other text before
  2. Contains BigDecimal(
  3. May contain a negative sign
  4. Contains at least one digit and a dot after the digits found
  5. Contains at least one digit afterwards
  6. Ends with )
  7. May have other text afterwards

I used three capture groups, {1}, {3,4,5} and {7}, so that I can wrap with quotes the second capture group i.e. the number that poses as an argument in the constructor and the other two, so that I can retain the beginning and end of the expression.

The final statement in PL/SQL was the following:

UPDATE MYTABLE SET ALGORITHM = 
REGEXP_REPLACE(ALGORITHM,
'(.*)BigDecimal\((-?[[:digit:]]+\.[[:digit:]]+)\)(.*)',
'\1BigDecimal("\2")\3')
WHERE REGEXP_LIKE(ALGORITHM,
'.*BigDecimal\(-?[[:digit:]]+\.[[:digit:]]+\).*')

I also replaced the constructors in actual files using a similar expression and utilizing global regexp find/replace in my IDE.

These actions eradicated a large category of rounding errors in our codebase (and it was also a fun quest).

However, one should have in mind that if all you have is a hammer, everything looks like a nail (obligatory reference: xkcd: Regular Expressions)

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

nsquar3d
nsquar3d

Written by nsquar3d

Some education on maths, algorithms, complexity and programming. Probably trying to apply coding solutions more often I would like to admit.

Responses (1)

Write a response