Text patterns are some of the fascinating and illuminating data that
one comes across: one’s ability to decode them makes it possible to
find whatever they are looking for. In this post, though, I am going
to show the difference between globing and regex.
The reason why we often mix these two has to do with the same set of
characters they use, and sometimes with differing meanings.
Text patterns are known as templates. Thus, a text pattern makes it
Think of what you want to look for;
Generalise it in terms of its spelling.
This way, it becomes possible to classify your search in terms of
This process is known as searching the needle in the hay, and is
employed by both the shell and regular expressions.
Wildcard characters and Text Patterns
To make sense of this discussion, I think we need to talk about
wildcard characters. These are symbols (such as punctuation marks)
that have special meaning to the shell or a regular expression
- By default, whichever engine is carrying out the search uses the
ordinary alphanumeric characters entered to match against
whatever it is going through. This means that for the shell,
this is the filesystem and the regex engine, it is the
- Other than the alphanumeric characters, the following characters
are special for both globing and regex:
*, ?, +, [, ], \.
- As you will see below, the meaning of these characters are the
same except for few cases that this post looks at.
Let us start by looking at what file globing is.
The way your shell interprets the wildcard characters to match
filenames may be simply thought of as globing. Thus,
If you type a filename as part of a command, usually when
passing it to a utility that expects a file, the shell sees the
characters in that filename argument and tries to match against
the files in the directory.
The matching is tested against each character of your string,
and if they are exact, then the file is found.
However, there are some characters that mean something else:
think of these characters creating a particular class of your
With file globing, the following wildcard characters create
special character classes. A character class in this case means
that one character (the wildcard in this case) can mean any of
the alphanumeric characters for the class. Thus,
|Zero or more characters. Every character falls in this class.|
|One missing character. Exactly one character of all the available characters.|
[ and ]
|Creates a character class: look for exactly one character as long as it is in this class.|
means that the
ls utility is to list only those files with
any letters and digits, of whatever length, that end in
On the other hand,
means that we are looking for a file that has the characters
meeting- followed by exactly two characters, then a
.txt at the
So a file like meeting-01.txt" or “meeting-bt.txt” will match, but not
“meeting01.txt” nor “meeting-abc.txt”.
So you can understand a question mark as being a placeholder for the
number of missing characters to be filled in by the shell. If there is
one question mark, only one charactere will be filled in.
Of interest is the
[ and ] square brackets: these behave almost the
same as they do under regular expressions. They create a class.
So let us say we are looking for a file called “cats.txt” or
“rats.txt”. The difference is in the first letter, either
We need to create a character class of two letters, c and r. To do
that, we must place them inside the square brackets like this:
So we have to type:
While globing works with filenames and file paths, regular expressions
work with any text. Usually inside files.
Most text editors support regular expressions. A regular expression
also uses a pattern, but what may be important in this post at this
time is to point out its difference from globing:
- Support for more wildcard characters
- Regular expressions have more wildcard characters than globing. For
instance, while the period character
., has not meaning in shell
globing, it stands for any character (except the newline) in regex.
- Different meaning for the question mark
- In regex, a question mark has a different meaning from that used in
shell globing. In file globing, the question mark is simply a
placeholder: it just means “in place of this exact number of missing
characters” whereas in regular expressions it means zero or one
character. Thus, if we are looking for the spelling of colour spelt
either in American or British English, we would type
which means, find colour with or without a
u in it.
Although globing is different from regexes, it is important in my
opinion to think of their similarities rather than their differences:
- Both are intended as building a text pattern to use during a search;
- Both use some wildcard characters to denote a class of characters.
- Their differences lie in the fact that globing is more for files
than their contents. As a result, some characters (such as the
period) that often form part of filenames do not attain a new
meaning as with regexes.
- At the end of the day, it is important to master the syntax of both
shell globing and that of regular expressions as a way to maximise
Thanks for reading through this post, and happy searching!