Further to my previous post, I have been mining the data I collected to see if I could find anything of interest, and evaluate the data I have collected.
One of things that interested to me, was to look at the placement of the story compared to it’s position in the top ten over time. So I set off my program to scrape the data from 9AM through to 9PM, I didn’t want too much data to start with. Scraping this data every hour. I then spent a while writing SQL statements to capture the statistics I was looking for over the time period, below you can see a simple line graph (it is in Logarithmic scale):
I produced this using Open Office.
I find it is quite interesting to see the third position on the indexes doesn’t get utilised so much. You could also suggest as the day gets later the more people start reading other stories, i.e. features, second, and other stories. However, one can’t jump to any conclusion for two reasons which jump out at me:
1 – This sample set is way too small!
2 – I think I need more detail, it would be interesting (for example) which position of the features and analysis are being selected? The same goes for the other stories.
I decided I would cobble together a simple heat map of the index, aggregated, over the time period to see the “behaviour”. Feel free to take a look, personally I feel it re-enforces my previous points (I used JavaFx script 1.3 via Netbeans to produce this):
I think I might go back to the scraper program and see if I can expand the data gathered.