Essential commands

Woah... hold your horses, we need to cover some basics about commands. A command is a process run by a POSIX (Portable Operating System Interface) compliant OS (Operating System). OpenGroup maintains the standard in addition to it being ratified as an IEEE standard (http://pubs.opengroup.org/onlinepubs/9699919799/). In a POSIX environment, the process being run will have an environment, a current working directory, the command line (the path name that invoked the command and any arguments), and a series of file descriptors with stdin, stdout, and stderr (referred to by integer numbers 0, 1, and 2, respectively) being connected prior to handoff to your command.

Now with a little background and an installed command line, you are ready to go and we can actually start running commands. We will be going over some basic everyday commands. For those that are ready to delve in, let's discuss how we locate the commands we can run.

Locating commands is akin to searching through a filing cabinet, what we call a filesystem. Commands are just files stored in the filing cabinet, and folders or directories are used to organize the files into a hierarchy. Each directory may contain many files or other directories, and has a single parent directory. To open our filing cabinet, we need to start at the top of the hierarchy, the root directory, /. The first set of commands you need to know involve commands to traverse the filesystem and get your bearings.

When you log into the command line, it's likely that you will be in your home directory. What this directory is varies by system. To see where you are, try the pwd (present working directory) command:

pwd

The following is what you should see on running the preceding command:

Here, ubuntu is your username. This means you are currently in the ubuntu directory, which is in the home directory, /. From here, if you try to open a file with a relative path name, that is, one that doesn't start with a /, the command line will look for that file in your current directory (you can do things with files in other directories without changing your current one, we will talk about that in a bit).

You might want to create your own directories. To do this, we can try the following command:

mkdir foo

The following is what you should see on running the preceding command:

Which makes the directory foo inside your current directory. If this command completes successfully, it won't print anything. To see the directory we just made, we use the list command:

ls

The following is what you should see on running the preceding command:

It should be on a line by itself. We might want to print a little bit more information about the directory. In this case, we can pass some flags to the ls command to alter what it's doing. For example, type the following:

ls -l

This is what you should see on running the preceding command:

It’s not too important right now to understand everything printed here, but we can see that foo is a directory, not a data file (from the d code in the front), and the date and time it was created. This is a common pattern among UNIX commands. The default version of the command does one thing, and passing in flags like -l.

Sometimes, commands have arguments, and sometimes flags of commands will have arguments, too. A general form of a command might appear as follows:

<command> -a <argument> -b -c -d <argument> <command arguments>

Here, a, b, c, and d are flags of the command. What exactly these commands are, and what they do, are dependent on the command.

Let’s go into our newly-created directory and mess around with some data files:

cd foo

The following is what you should see on running the preceding command:

The cd (or change directory) command changes your current working directory. Let's now string together two commands to create a data file. We will talk about this a bit later, but for now we just need a file to mess around with:

echo “Hello world...” > hello.txt

The following is what you should see on running the preceding command:

This won't produce any output, but it will create a file called hello.txt (as we told the shell to redirect stdout with > to a file) that contains the single line of Hello world… text. To see this, we can use the concatenate command:

cat hello.txt

The following is what you should see on running the preceding command:

This will print the contents of any file. If we only want to see the first, or last, few lines of a file, we could use head and tail instead of cat.

If this all sounds pretty simple, there’s a good reason: each command in UNIX is intended to do one thing and do it well. Often options can be used to tailor a command’s behavior. The really neat stuff you can do starts to happen when we start tying commands together using pipes and redirection.

You see, almost every command in UNIX has some way to input data into it. The command then takes the input, and, depending on its parameters and flags, transforms that input into something else and outputs it. We can use the pipe, |, to take the output from one command, and feed it into the input of another command. This simple but extremely powerful idea will let us do a lot with a few commands.

Let's try a simple example: let's use echo, with the -e flag, to tell it to pay attention to control characters, to make a multi-line file (by using the \n) with some numbers on each line.

echo -e "1\n3\n19\n1\n25\n5" > numbers.txt
cat numbers.txt

The following is what you should see on running the preceding command:

Now, say we wanted to see those numbers sorted. The sort command does just this. Using a flag to sort to consider the lines to be numbers and not strings, we can pipe the output of cat into the sort function:

cat numbers.txt | sort -n

The following is what you should see on running the preceding command:

If we then want to see just the unique numbers in sorted order, we can re-pipe this output to the uniq command, which returns unique lines from the given input:

cat numbers.txt | sort -n | uniq

The following is what you should see on running the preceding command:

And so on, and so on. We can build up the pipeline we want a bit at a time, debugging along the way. You will see this technique throughout this book.

One last thing: in some of these commands, we have seen the >, or redirect. Redirection can be used for a number of things, but most of the time it's used to redirect the output of a command to a file:

<some pipeline of commands>  > <filename>

This will replace the contents of the file named filename with the output of the pipeline.

With these simple tools, you have enough to get started hacking data with bash.

官术网_书友最值得收藏!

Hands-On Data Science with the Command Line

Essential commands