Parsing the Lines

In some ways, retrieving the IP address from a weblog line is the easiest of our parsing tasks, since the IP address is the first thing that occurs in a weblog line. As usual with parsing, strtok is always an option for performing this function. We can have it consider the decimal points in the IP address as delimiters.

At this point we must be careful. Since there are no further decimal points that act as delimiters in our weblog line, and since strtok will have placed a \0 at the last of the decimal points, we must provide it with a different delimiter in order for it to finish its job. This delimiter will be a space, which delimits the end of the IP address.

Note that strtok will be modifying the actual string it is working on, not just a copy of it. This may seem counter-intuitive, however, recall that a weblog line is a string or array of characters, and as an array, a copy is not actually passed into our function, but a pointer to the real thing, even though we did not precede the parameter with an ampersand. After all, an array of characters is actually a pointer to a collection of characters in memory, not the actual collection of characters itself. By passing this pointer, we effectively tell our function precisely where to find the actual data, and it is this which is modified by strtok. This explains why we first copied the string to a temporary buffer before operating on it with out function.

We construct our getIP function as follows-na:

void getIP(ipaddress * ip, char * logline)
{
    ip->int1 = atoi(strtok(logline, "."));
    ip->int2 = atoi(strtok(NULL, "."));
    ip->int3 = atoi(strtok(NULL, "."));
    ip->int4 = atoi(strtok(NULL, " "));
    
    return;
}

Our second task is to extract the HTTP code from our weblog line. This is slightly more difficult, since it is embedded well into the line, and we must instruct C correctly so that it can find it. Fortunately it is not so complicated, since the HTTP code appears as an integer immediately following the second inverted commas in the line and an additional space. Again we can use strtok to extract this number:

void getHTTP(int * http, char * logline)
{
    strtok(logline, "\"");
    strtok(NULL, "\"");
    http[0] = atoi(strtok(NULL, " "));
    
    return;
}

Notice the \" in the delimiter strings. If we wish to have inverted commas in strings, we must write them with a preceeding backslash in this way so that C doesn't confuse them with the other end of the string.