Tripping with Shell Expansion and Regexes

When working directly with the shell, whether on the console or in a GUI terminal, you will quickly realise that the experience of finding files and manipulating them is almost like working with regular expressions. This is at times exhilarating until you get some rough edges that trip you up.

But first things first: what are they?

Regular expressions are a language of text patterns which is useful for quickly finding any text. On the other hand, shell expansion refers to how the shell such as bash or zsh uses patterns to expand typed commands. This is sometimes known as file globbing.

The similarities between the two has to do with patterns for locating things: while regex is for text, globs are for files. Until you realise that we are talking of the same thing, because file names are just text as well.

How the shell handles special characters

The shell expansion can only take place when you are entering commands, well in the shell. A shell is just a program that runs commands on your behalf. You type a command, along with any optional arguments, and the shell will first analyse the text you typed for any special characters.

These special characters are what the shell will use to expand your command so that the operating system can get a complete, unambiguous, command. For example, when I type

ls  *.txt

when In a Documents/ directory, I expect to see a listing of all the text files with the .txt extension.

But the ls command will get this command as ls plan.txt aims.txt guide.txt schedule.txt etc. This is because the * means, “Every file in this directory with a .txt suffix”.

The question mark is another tripper

So from the previous section, one gets a view that the star (or aserisk) means “every” in the shell. What else?

Another interesting thing is the question mark: whereas this symbol means, “zero or one times” in regex, in shell grammar it means a placeholder. So if you type 0?.txt, you are effectively saying “look for a file with 0, followed by any character as a placeholder then followed by .txt”.

This will find files such as “01.txt”, “02.txt”, “0a.txt” but not “012.txt”. This is because the last example had two characters between 0 and the .txt. You would have to use two question marks as placeholders. So in shell expansion, you have to use the question mark to direct the number of characters to be looked at.

The vertical bar (|)

Finally, let us look at the vertical bar (|), which is also known as a pipe character. In regex, this is known as an alternation symbol. So if you want to say, “This or that”, you type something like this|that. but that is not the case in shell expansion.

This is the powerful character used for building powerful utilities on the command line. It is known as a pipe, because it makes it possible to take the output of one command and direct it to the next utility as its input. Thus, two or more commands may work together to produce a bigger thing without being aware of the other’s presence.

Let us see how in the next example.

I usually want to find out how many files are in a directory. I make use of two utilities: the ls and the wc utilities. The first one lists files and directories while the latter counts text words, characters and lines in any file. But then I have no file here: so I give it the output of the ls utility to read how many files are there like this:

ls | wc -l

So a pipe in shell expansion works like a real pipe and not as an alternation symbol.

Still there are similarities

But there are obvious similarities that are hard to ignore: other than that the star, the question mark and the vertical bar have got different meanings, it is possible to use regex in bash for instance.

For example, if you’re not sure how a given file is spelt, you can include both the upper and lowercase variations inside a character class using the round brackets.

Let’s say I am trying to find all files with Python in them. I can type:

ls *[Pp]ython*

Conclusion

Well, in this post I was interested in showing some common trips I used to have because of the false friendship between the shell expansion characters and those of the regular expressions. However, there are a lot more than these. Depending with the shell which you use, and its underlying support for other characters, you may come across more.