- Learning Boost C++ Libraries
- Arindam Mukherjee
- 1046字
- 2021-07-16 20:49:02
Handling command-line arguments
Command-line arguments, like API parameters, are the remote control buttons that help you tune the behavior of commands to your advantage. A well-designed set of command-line options is behind much of the power of a command. In this section, we will see how the Boost.Program_Options library helps you add support for a rich and standardized set of command-line options to your own programs.
Designing command-line options
C provides the most primitive abstraction for the command line of your program. Using the two arguments passed to the main function—the number of arguments (argc
) and the list of arguments (argv
)—you can find out about each and every argument passed to the program and their relative ordering. The following program prints argv[0]
, which is the path to the program itself with which the program was invoked. When run with a set of command-line arguments, the program also prints each argument on a separate line.
Most programs need to add more logic and validation to verify and interpret command-line arguments and hence, a more elaborate framework is needed to handle command-line arguments:
1 int main(int argc, char *argv[]) 2 { 3 std::cout << "Program name: " << argv[0] << '\n'; 4 5 for (int i = 1; i < argc; ++i) { 6 std::cout << "argv[" << i << "]: " << argv[i] << '\n'; 7 } 8 }
Programs usually document a set of command-line options and switches that modify their behavior. Let us take a look at the example of the diff
command in Unix. The diff
command is run like this:
$ diff file1 file2
It prints the difference between the content of the two files. There are several ways in which you can choose to print the differences. For each different chunk found, you may choose to print a few additional lines surrounding the difference to get a better understanding of the context in which the differing part appears. These surrounding lines or "context" do not differ between the two files. To do this, you can use one of the following alternatives:
$ diff -U 5 file1 file2 $ diff --unified=5 file1 file2
Here, you choose to print five additional lines of context. You can also choose the default of three by specifying:
$ diff --unified file1 file2
In the preceding examples, -U
or --unified
are examples of command-line options. The former is a short option consisting of a single leading hyphen and a single letter (-U
). The latter is a long option with two leading hyphens and a multi-character option name (--unified
).
The number 5
is an option value; an argument to the option (-U
or --unified
) preceding it. The option value is separated from a preceding short option by space, but from a preceding long option by an equals sign (=
).
If you are "diffing" two C or C++ source files, you can get more useful information using a command-line switch or flag -p
. A switch is an option that does not take an option value as an argument. Using this switch, you can print the name of the C or C++ function in the context of which a particular difference is detected. There is no long option corresponding to it.
The diff
command is a very powerful tool with which you can find differences in the content of files in full directories. When diffing two directories, if a file exists in one but not the other, diff
ignores this file by default. However, you may want to instead see the contents of the new file. To do this, you will use the -N
or --new-file
switch. If we want to now run our diff
command on two directories of C++ source code to identify changes, we can do it in this way:
$ diff -pN –unified=5 old_source_dir new_source_dir
You don't have to be eagle-eyed to notice that we used an option called -pN
. This is actually not a single option but two switches, (-p
) and (-N
), collapsed together.
Certain patterns or conventions should be evident from this case-study:
- Starting short options with single hyphens
- Starting long options with double hyphens
- Separating short options and option-values with space
- Separating long options and option-value with equals
- Collapsing short switches together
These are de facto standardized conventions on highly POSIX-compliant systems, such as Linux. It is, however, by no means the only convention followed. Windows command lines often use a leading forward slash (/
) in place of a hyphen. They often do not distinguish between short and long options, and sometimes use a colon (:
) in place of an equals sign to separate an option and its option value. Java commands as well as commands in several older Unix systems use a single leading hyphen for both short and long options. Some of them use a space for separating an option and option-value irrespective of whether it is a short option or a long one. How can you take care of so many complex rules that vary from platform to platform while parsing your command line? This is where Boost Program Options library makes a big difference.
Using Boost.Program_Options
The Boost Program Options library provides you with a declarative way of parsing command lines. You can specify the set of options and switches and the type of option-values for each option that your program supports. You can also specify which set of conventions you want to support for your command line. You can then feed all of this information to the library functions that parse and validate the command line and extract all the command-line data into a dictionary-like structure from which you can access individual bits of data. We will now write some code to model the previously mentioned options for the diff
command:
Listing 2.12a: Using Boost Program Options
1 #include <boost/program_options.hpp> 2 3 namespace po = boost::program_options; 4 namespace postyle = boost::program_options::command_line_style; 5 6 int main(int argc, char *argv[]) 7 { 8 po::options_description desc("Options"); 9 desc.add_options() 10 ("unified,U", po::value<unsigned int>()->default_value(3), 11 "Print in unified form with specified number of " 12 "lines from the surrounding context") 13 (",p", "Print names of C functions " 14 " containing the difference") 15 (",N", "When comparing two directories, if a file exists in" 16 " only one directory, assume it to be present but " 17 " blank in the other directory") 18 ("help,h", "Print this help message");
In the preceding code snippet, we declare the structure of the command line using an options_description
object. Successive options are declared using an overloaded function call operator()
in the object returned by the add_options
. You can cascade calls to this operator in the same way that you can print multiple values by cascading calls to the insertion operator (<<
) on std::cout
. This makes for a highly readable specification of the options.
We declare the --unified
or -U
option specifying both the long and short options in a single string, separated by a comma (line 10). The second argument indicates that we expect a numeric argument, and the default value will be taken as 3
if the argument is not specified on the command line. The third field is the description of the option and will be used to generate a documentation string.
We declare the short options -p
and -N
(lines 13 and 15), but as they do not have corresponding long options, they are introduced with a comma followed by a short option (",p"
and ",N"
). They also do not take an option value, so we just provide their description.
So far so good. We will now complete the code example by parsing the command line and fetching the values. First, we will specify the styles to follow in Windows and Unix:
Listing 2.12b: Using Boost Program Options
19 int unix_style = postyle::unix_style 20 |postyle::short_allow_next; 21 22 int windows_style = postyle::allow_long 23 |postyle::allow_short 24 |postyle::allow_slash_for_short 25 |postyle::allow_slash_for_long 26 |postyle::case_insensitive 27 |postyle::short_allow_next 28 |postyle::long_allow_next;
The preceding code highlights some important differences between Windows and Unix conventions:
- A more or less standardized Unix style is available precanned and called,
unix_style
. However, we have to build the Windows style ourselves. - The
short_allow_next
flag allows you to separate a short option and its option value with a space; this is used on both Windows and Unix. - The
allows_slash_for_short
andallow_slash_for_long
flags allow the options to be preceded by forward slashes; a common practice on Windows. - The
case_insensitive
flag is appropriate for Windows where the usual practice is to have case insensitive commands and options. - The
long_allow_next
flag on Windows allows long options and option values to be separated by a space instead of equals.
Now, let us see how we can parse a conforming command line using all of this information. To do this, we will declare an object of type variables_map
to read all the data and then parse the command line:
Listing 2.12c: Using Boost Program Options
29 po::variables_map vm; 30 try { 31 po::store( 32 po::command_line_parser(argc, argv) 33 .options(desc) 34 .style(unix_style) // or windows_style 35 .run(), vm); 36 37 po::notify(vm); 38 39 if (argc == 1 || vm.count("help")) { 40 std::cout << "USAGE: " << argv[0] << '\n' 41 << desc << '\n'; 42 return 0; 43 } 44 } catch (po::error& poe) { 45 std::cerr << poe.what() << '\n' 46 << "USAGE: " << argv[0] << '\n' << desc << '\n'; 47 return EXIT_FAILURE; 48 }
We create a command-line parser using the command_line_parser
function (line 32). We call the options
member function on the returned parser to specify the parsing rules encoded in desc
(line 33). We chain further member function calls, to the style
member function of the parser for specifying the expected style (line 34), and to the run
member function to actually perform the parsing. The call to run
returns a data structure containing the data parsed from the command-line. The call to boost::program_options::store
stores the parsed data from this data structure inside the variables_map
object vm
(lines 31-35). Finally, we check whether the program was invoked without arguments or with the help
option, and print the help string (line 39). Streaming the option_description
instance desc
to an ostream
prints a help string, that is automatically generated based on the command-line rules encoded in desc
(line 41). All this is encapsulated in a try-catch block to trap any command line parsing errors thrown by the call to run
(line 35). In the event of such an error, the error details are printed (line 45) along with the usage details (line 46).
If you notice, we call a function called notify(…)
on line 37. In more advanced uses, we may choose to use values that are read from the command line to set variables or object members, or perform other post-processing actions. Such actions can be specified for each option while declaring option descriptions, but these actions are only initiated by the call to notify
. As a matter of consistency, do not drop the call to notify
.
We can now extract the values passed via the command line:
Listing 2.12d: Using Boost Program Options
49 unsigned int context = 0; 50 if (vm.count("unified")) { 51 context = vm["unified"].as<unsigned int>(); 52 } 53 54 bool print_cfunc = (vm.count("p") > 0);
If you were observant, you would have noticed that we did nothing to read the two file names; the two main operands of the diff
command. We did this for simplicity, and we will fix this now. We run the diff
command like this:
$ diff -pN --unified=5 old_source_dir new_source_dir
The old_source_dir
and new_source_dir
arguments are called positional parameters. They are not options or switches, nor are they arguments to any options. In order to handle them, we will have to use a couple of new tricks. First of all, we must tell the parser the number and type of these parameters that we expect. Second, we must tell the parser that these are positional parameters. Here is the code snippet:
1 std::string file1, file2; 2 po::options_description posparams("Positional params"); 3 posparams.add_options() 4 ("file1", po::value<std::string>(&file1)->required(), "") 5 ("file2", po::value<std::string>(&file2)->required(), ""); 6 desc.add(posparams); 7 8 9 po::positional_options_description posOpts; 10 posOpts.add("file1", 1); // second param == 1 indicates that 11 posOpts.add("file2", 1); // we expect only one arg each 12 13 po::store(po::command_line_parser(argc, argv)14 .options(desc) 15 .positional(posOpts) 16 .style(windows_style) 17 .run(), vm);
In the preceding code, we set up a second options description object called posparams
to identify the positional parameters. We add options with names "file1"
and "file2"
, and indicate that these parameters are mandatory, using the required()
member function of the value
parameter (lines 4 and 5). We also specify two string variables file1
and file2
to store the positional parameters. All of this is added to the main options description object desc
(line 6). For the parser to not look for actual options called "--file1"
and "--file2"
, we must tell the parser that these are positional parameters. This is done by defining a positional_options_description
object (line 9) and adding the options that should be treated as positional options (lines 10 and 11). The second parameter in the call to add(…)
specifies how many positional parameters should be considered for that option. Since we want one file name, each for options file1
and file2
, we specify 1
in both the calls. Positional parameters on the command line are interpreted according to the order in which they are added to the positional options description. Thus, in this case, the first positional parameter will be treated as file1
, and the second parameter will be treated as file2
.
In some cases, a single option may take multiple option values. For example, during compilation, you will use the -I
option multiple times to specify multiple directories. To parse such options and their option values, you can specify the target type as a vector, as shown in the following snippet:
1 po::options_description desc("Options"); 2 desc.add_option() 3 ("include,I", po::value<std::vector<std::string> >(), 4 "Include files.") 5 (…);
This will work on an invocation like this:
$ c++ source.cpp –o target -I path1 -I path2 -I path3
In some cases, however, you might want to specify multiple option values, but you specify the option itself only once. Let us say that you are running a command to discover assets (local storage, NICs, HBAs, and so on) connected to each of a set of servers. You can have a command like this:
$ discover_assets --servers svr1 svr2 svr3 --uid user
In this case, to model the --server
option, you would need to use the multitoken()
directive as shown here:
1 po::options_description desc("Options"); 2 desc.add_option() 3 ("servers,S", 4 po::value<std::vector<std::string> >()->multitoken(), 5 "List of hosts or IPs.") 6 ("uid,U", po::value<std::string>, "User name");
You can retrieve vector-valued parameters through the variable map like this:
1 std::vector<std::string> servers = vm["servers"];
Alternatively, you can use variable hooks at the time of option definition like this:
1 std::vector<std::string> servers;
2 desc.add_option()
3 ("servers,S",
4 po::value<std::vector<std::string> >(&servers
5 ->multitoken(),
6 "List of hosts or IPs.")…;
Make sure that you don't forget to call notify
after parsing the command line.
The Program Options library uses Boost Any for its implementation. For the Program Options library to work correctly, you must not disable the generation of RTTI for your programs.