The Data Types

Now that numLines contains the number of lines of data in our weblog, we can allocate memory for our array of pointers and for the data structures that will contain the data itself. We will define a data type called logType which will contain the IP address and HTTP code as isolated from a line of the weblog, along with a complete copy of the line itself. Defining our array is then simple. We add the following lines to our program:

logType * * sortArray;

sortArray = calloc(sizeof(logType *), numLines);

Note that we define the array of pointers, sortArray to be a pointer itself. This is in accordance with the fact that a pointer to a collection of contiguous blocks of data in memory is the same thing as an array in C. Of course, setting sortArray equal to the return value of calloc sets it to point to the first of the entries of our dynamically allocated array in memory.

Note that we could not have defined our array as follows-na:

logType * sortArray[numLines];

The reason for this is that numLines is not a constant, fixed for all time, but a variable whose value is only set at runtime. Defining an array statically like this is only possible when the number of entries is known before the program is compiled.

An IP address is nothing more than an ordered collection of four integers. In fact they can only be as big as 255, and so they are in fact each of type short int. Thus we can define a type for storing an IP address as follows-na:

typedef struct ipaddress
    short int int1, int2, int3, int4;
} ipaddress;

An HTTP code is simply an integer, therefore we can define our logType structure and type as follows-na:

typedef struct logStruct
    ipaddress ip;
    int http;
    char * logline;
} logType;

The third field of this structure is a pointer to a complete copy of the corresponding line of the weblog, so that this information is never lost. This is important, for once we have sorted the data according to IP address or HTTP code, we will want to output the lines of the weblog in this new order. If we only retain the IP addresses and HTTP codes from the lines, we will have no way of doing this.