
A FEW USEFUL UNIX COMMANDS
==========================

We'd like to present a few commands/tricks that might be useful
when running scientific programs. The purpose of this document is 
to give a very concise review of some very useful techniques.

Although the Unix commands themselves are to a large extent the same
on different types of machines (e.g. SunOS, Linux,...), there exist
several flavors of shells, i.e. several types of command line interface,
among which the user can choose. They each have a (slightly) different 
programming syntax and so we will restrict ourselves here to the
bash shell, as it is one of the more comfortable ones.


1) COMPILING AND RUNNING PROGRAMS

Copy this very simple C++ program into a file called prog.cpp:

////////////////////////////////////////////////////////////////////
#include <iostream>
#include <cmath>

using namespace std;

int main() {
  float temp;

  cout << "Please enter a number: ";
  cin >> temp;

  cout << endl << "The square root of " << temp << " is "
       << sqrt(temp) << "." << endl;

  return 0;
}
////////////////////////////////////////////////////////////////////

To compile it type

% g++ prog.cpp

The sign "%" in the previous line should NOT be typed. It only 
indicates which Unix commands you have to type. The executable 
created by g++ will be called a.out (that's the default).
To run this program type,

% ./a.out

If you would like to give the executable program a more descriptive 
name than a.out (e.g. prog.exe) you can type:

% g++ -o prog.exe prog.cpp

To tell the compiler to optimize the executable for speed, simply type

% g++ -O -o prog.exe prog.cpp


2) READING INPUT FROM AND SAVING RESULTS TO A FILE

Sometimes we would like to have the program read its input from a
file instead of the keyboard. To see how this can be done create
a file called prog.input with one line containing the input parameter. 
To have the program read its input from this file type

% ./prog.exe < prog.input

To run the program in the background,

% ./prog.exe < prog.input &

When running a program in the background it is often inconvient to
have output on the screen. To save the output in a file with name
prog.output instead of displaying it on the screen type,

% ./prog.exe < prog.input > prog.output &

and have a look at the output with

% cat prog.output

If the file prog.output exists already, the shell will overwrite
the existing file. If you would like the output to be appended to 
the file prog.output you can use >> instead

% ./prog.exe < prog.input >> prog.output &

With the previous commands, not everything is sent to the output file.
In particular, error messages keep being sent to the screen. If you
also want to have the error messages in the file, use 

% ./prog.exe < prog.input &> prog.output &

or

% ./prog.exe < prog.input &>> prog.output &

Sometimes one would like to see the output AND have a copy of it in a
file. Again, that's easy

% ./prog.exe < prog.input | tee prog.output

"tee" reads from standard input and writes to standard output and files.

3) WRITING SCRIPTS TO AUTOMATE THE EXECUTION OF PROGRAMS

Suppose you have a program calculating some physical properties given
a few input parameters. You now want to repeat these calculations for
a large number of values of the input parameters, each time saving the
results into different output files. How can you avoid having to do this 
by hand? The solution is to write a small script. For example,

######################################################################
#!/bin/bash
for temp in 10 20 30 40 50
do
  ./prog.exe > prog.output.$temp << MARKER
$temp
MARKER
done
######################################################################

Copy this script into a file called run_prog, make it executable (with
the command 'chmod u+x run_prog'), and run it (./run_prog). You will
see that the program prog.exe will be called for the temperatures
temp=10, 20, ..., 50, with the results saved in prog.out.10, ...

One can also make the list of sampling values a parameter of the
script. Replace the second line with

% for temp in $*

and try the command

% ./run_prog 5 15 25 35 45


4) PIPES AND ALL THAT

One of the philosophical principles behind the Unix command line is to
have small utility programs doing only one type of operation (but
do it well). More complex operations are then realized by "linking"
together these building blocks. For example, the following command
prints a list of the integer from 1 to 10:

% echo `seq 1 10`

I can use this as an input for the script run_prog, with

% ./run_prog `seq 1 10`

We can now sort the results stored in all the output files according
to the fifth column

% cat prog.output.* | grep "[0-9]" | sort -n -k 5

As a final example, we have seen during the lectures a small C++
program, which was able to take a text and extract from it a list of
all the words appearing in it. Here it is.

////////////////////////////////////////////////////////////////////
#include <iterator>
#include <iostream>
#include <algorithm>
#include <vector>
#include <string>

using namespace std;

int main()
{
  vector<string> data;
  copy(istream_iterator<string>(cin),istream_iterator<string>(),
       back_inserter(data));
  sort(data.begin(), data.end());
  unique_copy(data.begin(), data.end(),ostream_iterator<string>(cout,"\n"));
}
////////////////////////////////////////////////////////////////////

Here is the same thing in one line, using the Unix shell.

% cat text.txt | tr -cs "A-Za-z" "[\012*]" | tr A-Z a-z | sort | uniq | more

The power and flexibility of this approach is clear. Its weakness is
speed: compare the time needed for texts of increasing size with that
needed by the C++ program. Still, for just one play by Shakespeare, it
is still fast enough. Suppose now that you also want to count how many
times each word appears in the text, and sort the results in order of
increasing frequency. That's easy:

% cat text.txt | tr -cs "A-Za-z" "[\012*]" | tr A-Z a-z | sort \
  | uniq -c | sort -rn | head -50

How would you modify the C++ program to do the same job?

