Module 3: Introduction

It's time to write a program that demonstrates a particular forte of the C language: file access! In this module you put together a complete weblog "analyser". A weblog is the record of visits to a website, made as people click on the various webpages on that site. It records the IP address of each person that accesses the site, along with all the files that they download whilst there.

Of course the principles learned in this module can be used for any project that requires one to read a text file, sort the data and output some sort of "analysis". So even if you don't care for weblogs, this tutorial should be of independent interest.

The data file which we shall use for this project is an actual weblog, from FriedSpace.com. We will use the second month of actual data from the FriedSpace.com site. Most of the accesses are from robots (tools used by major search engine companies to build their search directories), friends of mine, myself and a few nasty spambots. Of course there is no personal data contained in an IP address. It tells you at best what city the person was in who accessed the site. Nevertheless I have changed some of the IP addresses, just to be sure.

Now it is a common requirement of webmasters all over the world to have a program that will sort weblog data and provide them with some idea of where their website traffic is coming from. Therefore we will design our program to meet this real world need (albeit in a not terribly sophisticated way).

The algorithm that we will use for sorting is a simple but important one, called a bubble sort. Whilst it has its limitations, it can be used in a wide variety of situations for sorting real world data.