Cheaper by the dozen
After getting used to the command line, you will start looking for ways to do more in less time. One of the easiest ways to achieve that is to work on multiple files at the same time, so that instead of:
$ rm this
$ rm that
$ rm here
$ rm there
you just remove all those files with one command. Many commands, including rm, let you simply specify all the files you want to delete as arguments in one go:
$ rm this that here there
Still, there has to be a better way!
Globbing
File globbing is the shell's way of dealing with multiple files with the fewest characters possible. The shell treats certain characters as codes that you can use to specify groups of things you want the commands to affect. These characters are commonly called "wildcards" because they're like a card in a game that the players have designated to represent anything you want.
The "*" Wildcard
Imagine a directory of files:
$ ls
here that there this
that you want to delete. A tedious job can be turned simple by using the * or asterisk wildcard.
$ rm *
When used by itself, the asterisk wildcard refers to all the files in the directory. We say that the shell expands the wildcard. Knowing what's in the directory, the shell substitutes those filenames for the asterisk and effectively executes the following command:
$ rm here that there this
You can combine * with other characters, however, to make it selective.
$ rm t*
$ ls
here
What happened here? The shell looked at "t" first and then expanded the asterisk to cover all the files that begin with "t'. If you had requested "h*"instead, the shell would have removed any filename that started with an "h". Let's restore the original files and see what happens:
$ rm h*
$ ls
that there this
The asterisk wildcard can be placed anywhere within a word. Let's switch to an ls command because it's easier to see what's happening with wildcards:
$ ls th*re
there
By switching from rm to ls we see an important aspect of wildcard: you can use them with any command, because the shell interprets them before it even invokes the command. In fact, you can't issue a command without taking into account the behavior wildcards, because they're a feature of the shell. (Luckily, you're not likely to ever have to deal with a filename that contains a real asterisk.)
Multiple asterisks can also be used together. For instance, in this way you can find filenames where the middle of a series is the same, but they start and end differently. Let's try it on the original four files:
$ ls *i*
this
People often use the asterisk to remove files that are all of one type. For instance, if you've been working with a lot of photos and want to clean up files ending in .jpg when you're finished, you can remove all the ones in the current directly as follows:
$ rm *.jpg
Suppose you have some files ending in .jpg and some ending in .jpeg. The asterisk still makes clean-up easy:
$ ls *.jp*g
And suppose the JPEG files are scattered among several subdirectories. You have directories named photos1, photos2, photos3, and so forth, each containing JPEGs you want to remove. A wildcard can help you list all the contents of those subdirectories:
$ ls photos*
photos1:
centraal_station.jpg nieuwe_kerk.jpg
photos2:
ica.jpeg sanders_theater.jpeg
photos3:
bayeux_cathedral.jpeg rouen_cathedral.jpeg travel.odt
And you can specify a directory along with the filenames you remove:
$ rm photos*/*.jp*g
$ ls photos*
photos1:
photos2:
photos3:
travel.odt
Only the travel.odt file remains (because it doesn't match ".jp*g") as a record of all the trips you've taken.
There is, however, one limit to the asterisk wildcard. By default, it will not match any hidden files (those with filenames that start with a dot, you need to ls -a to see these).
$ ls -a
.
..
.hidden
this
that
here
there
$ rm *
$ ls -a
.
..
.hidden
If you want those hidden files deleted by a wildcard it is necessary to append a dot to the front of the wildcard. Note that normal files (ie. those that are not hidden/do not start with a dot) will not be deleted when you do this:
$ls -a
.
..
.hidden
here
$rm .*
$ls -a
.
..
Finally, it's important to note that the asterisk can also match when nothing is there. In the following listing, task is listed along with files that have something to match the asterisk:
$ ls task*
task taskA taskB taskXY
The "?" Wildcard
The "?" or question mark wildcard is very similar to the asterisk wildcard. The crucial difference is that the question mark wildcard takes the place of only one character.
$ ls task*
task taskA taskB taskXY
$ ls task?
taskA taskB
$ ls task??
taskXY
As we've already seen, the asterisk matches all the files beginning with "task". A single question mark matches files that have a single character after "task". The double question mark requires exactly two characters in that position.
The "[ ]" Wildcards
The square brackets wildcards can get even more specific, denoting a ranges of characters.The following ls command includes a -1 option, which means "list one entry on each line." This makes it easier to see how the files in this example differ.
$ ls -1
file_1
file_2
file_3
file_a
file_b
file_c
By using the square brackets, you can remove specific files without typing every name completely.
$ rm file_[1,3,a,c]
$ ls -1
file_2
file_b
You can also specify a bunch of individual characters without the commas are optional. You could get the same effect through:
$ rm file_[13ac]
Furthermore, within the square brackets, the order of the characters doesn't matter.
Combining square brackets with a hyphen, you can also do ranges of files. Let's start with a directory containing lots of files ending in numbers:
$ ls -1
file_1
file_2
file_3
...
file_78
At first it might be tempting to use the asterisk wildcard here. However, what if we need to remove only files 11-34? We could use the comma-separated form of the square brackets wildcard, but you would still have to type twenty three numbers, plus the commas. Fortunately, there is a much easier way.
$rm file_[11-34]
Now the only files left are files 1-10 and 35-78. By using the dash between a set of numerals in the square brackets, you make the shell expand the pattern by creating a name with every number between the starting value to the left of the dash and the end value to the right.
Ranges aren't just for numbers. They can also use letters.
$ls -1
file_a
file_b
file_c
file_d
$ rm file_[a-c]
$ ls -1
file_d
Both commas and ranges can be combined into the same instance of square brackets.
$ls
file_a
file_b
file_c
file_1
file_2
$rm file_[a-c,1,2]
$ls
Globbing When No File Matches
Suppose you specify a wildcard and the shell can't find any matching filename
$ ls -1
file_a
file_b
file_c
file_d
$rm file?
rm: cannot remove `file?': No such file or directory
When there is no file to match a pattern, the shell passes the wildcard to the program unexpanded. That's why you get an error message from the rm program, not from the shell.
Disabling A Wildcard
Okay, we know the shell will pass a wildcard as an option to a program when it can't find a file, but what do we do when we want to send a character that also happens to be a wildcard to our program? Here's a common example: we want to search a file for every occurrence of an asterisk.
$ ls
2file
*file
*?****[a-b]
Now we happen to want *file, but we get:
*file 2file
Why? Because the asterisk is a wildcard, the shell expanded it before sending it to ls. So after expansion, the command would look like:
$ ls *file 2file
If we want ls to find an asterisk something different is in order.
The "\", or backslash, tells the shell to treat the following character as a normal character and do no expansion.
$ls \*file
*file
Because the asterisk is the next character after the backslash, the shell sends the asterisk to ls unmodified. In other words: the backslash escapes the asterisk.
The backslash modifier works well when we have only one wildcard character that we want to pass to a program, but what if we wanted to pass a string like *?****[a-b] with lots of characters that would normally be interpreted as wildcards? If we used backslashes to escape them, we'd have to mark every single character. A short string would end up turning into: \*\?\*\*\*\*\[a-b\]. Instead of doubling our amount of typing, we can use a pair of single quotes.
$ls '*?****[a-b]'
*?****[a-b]
Any string encased in single quotes will not be modified by the shell, even when it's filled with wildcards.
Inverting a Wildcard
Sometimes you want every file in this directory except the ones that match a pattern. For example, you might have a directory with a few hundred files and about fifty of them follow no pattern in naming, but the other hundred and fifty do. If you could just invert what a wildcard looks for, you would have just what you need. That's where the "^", or caret modifier, and a set of square brackets comes in.
$ rm [^file_]*
The caret modifier turns the pattern over. It tells the shell to match all the filenames that don't match the pattern. There is one drawback to this trick, however, because only the asterisk wildcard can be specified outside the square brackets.
$ ls -1
file_1
file_2
...
file_100
random_file
more_random_file
file_xyz
$ rm [^file_]*
$ ls -1
file_1
file_2
...
file_100
file_xyz
In this case, since file_xyz matches the 'file_' glob, it doesn't get expanded, so rm doesn't affect it.
Multiple files in the ls command
We've already seen that you can list subdirectories with an ls command:
$ ls photos*/*
To list all files under a particular directory, use the -R option.
$ ls -R
photos1 photos2 photos3
./photos1:
centraal_station.jpg nieuwe_kerk.jpg
./photos2:
ica.jpeg sanders_theater.jpeg
./photos3:
bayeux_cathedral.jpeg rouen_cathedral.jpeg travel.odt
We saw a similar option earlier with the rm command. The "R" stands for "recursive." Note that the recursive option is uppercase -R in the ls command (-r is used to reverse the ordering the directory contents), but can be either uppercase or lowercase in the rm command.
The Find Command
When you first get a computer, you tend to place files in just a couple folders or directories. But as your list of files grows, you have to create some subdirectories and spread the files around in order to keep your sanity. Eventually, you forget where files are. "Where did I store those photos I took in Normandy?"
You could run ls -R, as in the following section, and start running your finger down the screen, but why? Computers are supposed to be about automation. Let the computer figure out where the file is.
If you know your file is named "somefile", telling the computer what to do is pretty easy.
$find . -name somefile -print
./files/somefile
The find command takes more arguments that the other commands we've seen so far, but if you use it for a while you'll find it becomes natural. Its first argument (the '.') tells find where to start looking: the directory at the top of everything you're searching through. In this case, we're telling find to start looking in whatever directory we're in right now.
The -name argument tells it to look for a file named somefile. Finally, the -print' option tells the command to print out on our screen the location of any file that matches the name it was given.
Wildcards with Find
What if you don't remember the name of the file you're looking for? You might only remember that it starts with "some". Luckily, find can handle that too.
$find . -name 'some*' -print
./dir1/subdir2/files/somefile_other
./some_other_file
./files/somefile
This time it found a few more files than you were after but it still found the one you wanted. As you can see, the find command can process wildcards in much the same way the shell can. Here it was told to look for anything that start with the letters "some".
The "*", "?", and "[ ]" wildcards can all be used just as they would be in the shell. However, since find is using the wildcards you have to make sure they remain unaltered by the shell. To do this you can surround the name you're searching for, and the wildcards it contains, in single quotes.
Trimming The Search Path
With just a name and a location, find will begin searching through every directory below its starting point, looking for matches. Depending on how many subdirectories you have where you're searching, find can take a lot of time to look in places you know don't contain the file.
It is possible, however, to control how far find sinks in the directory tree.
$find . -maxdepth 1 -name 'some*' -print
./some_other_file
By using the '-maxdepth' argument we can tell find to go no lower than the number of directories we specify. A maxdepth of 1 says: don't leave the starting directory. A maxdepth of 3 would allow find to descend 3 directories from where it started, and so on. It's important to note that '-maxdepth' should immediately follow the start location, or find will complain.
Searching for File Using Criteria Besides Names
The find command can search for files based on any criteria the filesystem know about files. For instance, you can search for files based on:
- When they were last modified or accessed (somebody read them)
- How big they are
- Who owns them, or what group they are in
- What permissions (read, write, execute) they have
- What type of file (directory, regular file) they are
and other criteria described in the manual page. Here we'll just show a couple popular options
The -mtime option shows the latest modification time. Suppose you just can't remember anything about a file's name, but know that you created or modified it within the past three days. You can find all the files in your home directory that were created or modified within the past three days through:
$ find ~ -mtime -3 -print
Notice the minus sign before the 3, for "less than." If you know you created the file yesterday (between 24 and 48 hours ago), you can search for an exact day:
$ find ~ -mtime 1 -print
To find files that are more than 30 days old (caution: there will be a lot of these), use a plus sign:
$ find ~ -mtime +30 -print
Perhaps you want to remove old files that are large, before backing up a directory. Combine -mtime with -size to find these files. The file has to match all the criteria you specify in order to be printed.
$ find directory_to_backup -mtime +30 -size +500k -print
We've specified +500k as our -size option. The plus sign means "greater than" and "500k" means "500 kilobytes in size".
Using Find To Run a Command on Multiple Files
The find command can do much more powerful things than print filenames. You can combine it with any other command you want, so that you can remove files, move them around, look for text in them, and so on. On those occasions, the find command with its '-exec' option is just what you'll need.
Because the next example is long, it is divided onto two lines, with a backslash at the end of the first so the shell keeps reading and keeps the two lines as one command. The first line is the same as the command to find old, large files in the previous section.
$ find directory_to_backup -mtime +30 -size +500k -print \
-exec rm {} \;
The -exec option is followed by an rm command, but there are two odd items after it:
- {} is a special convention in the -exec option that means "the current file that was found"
- \; is necessary to tell find what the end of the command is. A command can have any number of arguments. Think of -exec and \; as surrounding the command you want to execute.
So we find each file, print the name through -print (which we don't have to do, but we're curious to see what's being removed), and then remove it in the -exec option.
Clearly, a tiny mistake in a find command could lead to major losses of data when used with -exec. Test your commands on throw-away files first!
Using cp you can see how the bracket pairs can be specified multiple times, allowing the file's name to be easily duplicated.
$ find . -name 'file*' -exec cp {} {}.backup \;
How to Split Files
Suppose you have a 600MB ISO file you'd like to split into many pieces for easier storage. You can do so with:
$split -b 200m image.iso image_iso_
This example generates three files named image.iso_aa, image.iso_ab, and image.iso_ac, each 200MB in size. If you want to join them again, use the command:
$cat image.iso_* >new-image.iso
Experiment with these commands --the only way to get better at using them is practice!