Posts Tagged ‘Data visualisation’

Further analysis of the scraped BBC data

May 4, 2011

Further to my previous post, I have been mining the data I collected to see if I could find anything of interest, and evaluate the data I have collected.

One of things that interested to me, was to look at the placement of the story compared to it’s position in the top ten over time. So I set off my program to scrape the data from 9AM through to 9PM, I didn’t want too much data to start with. Scraping this data every hour. I then spent a while writing SQL statements to capture the statistics I was looking for over the time period, below you can see a simple line graph (it is in Logarithmic scale):

I produced this using Open Office.

I find it is quite interesting to see the third position on the indexes doesn’t get utilised so much. You could also suggest as the day gets later the more people start reading other stories, i.e. features, second, and other stories. However, one can’t jump to any conclusion for two reasons which jump out at me:

1 – This sample set is way too small!
2 – I think I need more detail, it would be interesting (for example) which position of the features and analysis are being selected? The same goes for the other stories.

I decided I would cobble together a simple heat map of the index, aggregated, over the time period to see the “behaviour”. Feel free to take a look, personally I feel it re-enforces my previous points (I used JavaFx script 1.3 via Netbeans to produce this):

I think I might go back to the scraper program and see if I can expand the data gathered.

Advertisements

A treemap representing the amount spent on Housing benefit

November 14, 2010

A colleague and I recently requested “Housing benefit cost data for every local authority within the UK (England, Scotland, Wales and NI), including the amount per local authority broken down into local authority and private properties per local authority. Specifically the maximum, minimum, mean, median, quartile 1, quartile 3, 10th percentile and 90th percentile for each local authority and private properties within each local authority.” via a Freedom of Information request of the Department for Work and Pensions.

Of course, now housing benefit, well all benefits, are in the news so this is almost timely (maybe a few weeks out, but I have been really busy with my day job!)

The response to our request was “As you have specifically requested cost information by the mean, median, quartile 1, quartile 3, 10th percentile and 90th percentile we have interrogated our databases and produced the attached analysis which is based on weekly award of Housing Benefit at March 2010. Further information on Housing Benefit is available on the DWP website at: http://research.dwp.gov.uk/asd/hbctb.asp . In addition, Housing Benefit expenditure by local authority is available at: http://research.dwp.gov.uk/asd/asd4/expenditure.asp

The information you requested for maximum and minimum is being withheld as it falls under the exemption in section 40 of the Freedom of Information Act. This exemption covers personal information about a third party and if provided could lead to a reasonable chance of individual claimants and their families being identified. This would breach the families’ right to privacy contrary to the Data Protection Act. ”

We thought it might be interesting to view this data as a treemap, it gives a good overview of the data allowing you to explore different areas and see which have the highest and lowest claims at each level. Each box represents the average weekly amount of housing benefit for the specified area, in sterling.

As before, I have produced the visualisation in Processing utilising the functionality I learnt about in Ben Fry’s book Visualising data. You can see my other example of a treemap here.

A Treemap to find a rental bike in London using Processing

August 30, 2010

I was recently searching the web for news stories about the new rental bikes in London. In the process I stumbled across an API, produced by Adrian Short.

Upon seeing this data I had an epiphany!!! I have come across numerous APIs in my travels of the web, but I realised I could do something with this data!

Boris Bikes

One of the comments journalists have made, along with a few friends, is that either trying to locate a bay of bikes, knowing if there are any bikes available, or if there is room to park your bike when you get there. Here the treemap comes into its own.

Some background on treemaps (for those who don’t know): “Ben Schneiderman, of the Human-Computer Interaction Laboratory (HCIL) at the University of Maryland, worked out a set of algorithms to subdivide 2D space to show the relative size of files and directories (see http://www.cs.umd.edu/hcil/treemap-history). He call the representation a treemap, referencing the two-dimensional mapping of tree structures.” [quote taken from Ben’s book]

I had the idea from Ben Fry’s book Visualising Data in which he uses Ben Shneiderman’s Treemap software (which I have also utilised) http://www.cs.umd.edu/hcil/treemap-history.

Before visualising this data as is, I decided I wanted to create a top layer, that is the location (i.e. North, East, South, West) of London you want to investigate, and when you drill in you can then view all of the regions in that location, and then you can drill in further to see the specifics. For each level I decided to have a info box follow the mouse point to give the full details of what you were looking at. Remember the size of the rectangle represent how full the dock is, if it is full of bike the rectangle is large, if it empty of bike the rectangle is small.

To drill into the data Left Mouse on the Party which will take you in a level. To return a level, or more, Right Mouse Click.

I can’t thank Adrian Short enough for this, Boris Bikes for hosting the link to the API, and the Processing project.

I really like how it looks (no surprise there!)?

As ever this is a working prototype and I would interested to hear any feedback? Of course, because this is a Processing Application it is a Java Applet and because the applet is accessing the API it needs to be signed. You will have to allow access for the applet to work. Have a play (http://bikefacts.co.uk/Processing/Boris/inde.html).

Trying to Replicate Processing’s House of Cards code in JavaFX

August 12, 2010

One of the books I am reading, on and off, is Beautiful data. I recently read Chapter 10 – Building Radiohead’s House of Cards by Aaron Koblin with Valdin Klump and thoroughly enjoyed it. I was mightily impressed that Processing was used in the production of the video, which gave me the inspriation to:

  1. Have a look at the Processing code
  2. See if I can replicate the code in JavaFX!

In the first instance you can find the Processing code, which is part of the Google project, here. I encourage you to check it out, it is really good (especially after reading the chapter in the book!).

Ok, so the code for Processing is really straight forward (I have stripped out comments):

import processing.opengl.*;

int frameCounter =1;

void setup()
{
  size(1024,768, OPENGL);

  strokeWeight(1);
  
}
void draw()
{
  background(0);
  translate(width/2, height/2); 
  translate(-150,-150);
  scale(2);
  
  String[] raw = loadStrings(frameCounter+".csv");

  for(int i = 0; i 2101)
  {
    exit();
    println("done");  
  }
}

With the data in place (1000 CSV files), this works really nicely “straight out of the box”. It is a joy to play with it.

Here is a screen shot of the data represented in Processing:

So, being thoroughly motivated I started work on writing the JavaFX equivalent. In the first instance I needed to load the CSV files. Now Processing has a function for that, for JavaFx I decided to write my own Java class to process it:

public class LoadStrings
{
    URL url;
    FileInputStream fr;
    BufferedReader br;
    List<String> myLines = new ArrayList<String>();
    String a;
    
    public LoadStrings(String fname)
    {
        String theLine;
        try
        {
            url = new URL(fname);
            br = new BufferedReader(new InputStreamReader(url.openStream()));
            while((theLine = br.readLine()) != null)
            {
                myLines.add(theLine);
            }

            br.close();

        }
        catch(IOException e)
        {
            System.out.println(e.toString());
        }
        catch(Exception e)
        {
            System.out.println("It ain't worked!");
            System.out.println(e.toString());
        }
    }

    public String[] getRows()
    {
        String[] res = new String[myLines.size()];
        for (int i = 0; i < myLines.size(); i++)
        {
            res[i] = myLines.get(i);
        }

        return res;
    }
}

With that complete I started on the Main application. Processing’s draw() function is a continuous loop, I believe based on the framerate. With this in mind I needed to replicate the draw function, so I choose to use an animator – Timeline. Before defining this animation I declared:

var lines : Line[] = [];
var theFrame = 0;

I defined my Timeline as:

def projector : Timeline = Timeline
{
    repeatCount: 1000
    framerate: 60

    keyFrames:
    [
        KeyFrame
        {
            time: 0.5s
            action: function()
            {
                delete lines;
                var dataPoints : DataPoint[] = [];

                var f: String = "{__DIR__}data/{theFrame+1}.csv";

                var fr = new LoadStrings(f);

                var rows : String[] = fr.getRows();

                var counter = 0;

                while (counter < rows.size())
                {
                    var thisLine : String[] = rows.get(counter).split(',');

                    var x = Number.parseFloat(thisLine[0]) + (1024/2) + -200;
                    var y = Number.parseFloat(thisLine[1]) + (768/2) + -200;
                    var sr = (Number.parseFloat(thisLine[3]) / (255.0 * 1.1) * (255.0));
                    var sg = (Number.parseFloat(thisLine[3]) / (255.0 * 1.6) * (255.0));

                    insert DataPoint
                    {
                        x: x
                        y: y
                        z: Number.parseFloat(thisLine[2]);
                        intensity: Number.parseFloat(thisLine[3]);
                        scaledRed: sr
                        scaledGreen: sg
                    } into dataPoints;
                    counter++;
                }

                for (dp in dataPoints)
                {
                    insert Line
                    {
                        startX: dp.x
                        endX: dp.x+1
                        startY: dp.y
                        endY: dp.y+1
                        stroke: Color.rgb(dp.scaledRed, dp.scaledGreen, 200,1);
                    } into lines
                }
                theFrame++;
            }
        }
    ]
}

projector.play();

So my stage looks like this:

var theStage : Stage = Stage
{
    title: "House of Cards"
    scene: Scene
    {
        width: 1024
        height: 768
        fill: Color.BLACK;
        content: bind
        [
            lines
        ]
    }
}

It is all very exciting, isn’t it…? Before implementing this code I managed to load the application showing just the first file and it’s data points and it looked exceptional:

I was very excited, although a little apprehensive as the time it took to both compile (over 2 minutes on my machine with the full 1000 files) and running the app. It was a lot slower to load than the Processing instance. I am writing this in Netbeans, on OS X, with 4GB of RAM.

Running this, with time: 0.5s results in a slow animation, which leaves you wanting more. I tried to reduce the time, even commenting it out, but the image couldn’t get updated in time and the effect was even worse.

Conclusion:

Ok, so JavaFX was not really designed for this type of work (well that is my understanding). However I am a little surprised that I can’t run this like I can with Processing. I even tried increasing the memory setting in Netbeans by changing the properties of the project so that Run JVM arguments are: -Xmx3072m.

This is my first attempt at anything like this, so perhaps I am mis-understanding something somewhere.

Perhaps I am misunderstanding animations in JavaFX?

I tried to look into the 3D functionality with JavaFx, but I struggled to find examples I could relate to.

All in all I feel I have tried (and there is never any harm in trying). I like what I have, I have feeling it could be better?

Any pointers / thoughts?

Update: 15th August

So after some excellent feedback from Jonathan Giles, I have changed the code as follows.

The LoadStrings now looks like:

public class LoadStrings
{
    URL url;
    FileInputStream fr;
    BufferedReader br;
    List<String> myLines = new ArrayList<String>();
    List<DataPointJ> dataPoints = new ArrayList<DataPointJ>();   

    public LoadStrings(String fname)
    {
        String theLine;
        try
        {
            url = new URL(fname);
            br = new BufferedReader(new InputStreamReader(url.openStream()));
            while((theLine = br.readLine()) != null)
            {
                myLines.add(theLine);
                String[] t = theLine.split(",");
                dataPoints.add(new DataPointJ(Float.parseFloat(t[0]),Float.parseFloat(t[1]),Float.parseFloat(t[2]),Integer.parseInt(t[3])));
            }

            br.close();

        }
        catch(IOException e)
        {
            System.out.println(e.toString());
        }
        catch(Exception e)
        {
            System.out.println("It ain't worked!");
            System.out.println(e.toString());
        }
    }

    public String[] getRows()
    {
        String[] res = new String[myLines.size()];
        for (int i = 0; i < myLines.size(); i++)
        {
            res[i] = myLines.get(i);
        }

        return res;
    }

    public List<DataPointJ> getDps()
    {
        return dataPoints;
    }
}

The Timeline:

var projector : Timeline = Timeline
{
    repeatCount: 50

    keyFrames:
    [
        KeyFrame
        {
            time: 0.5s
            action: function()
            {
                var f: String = "{__DIR__}data/{theFrame+1}.csv";

                var fr = new LoadStrings(f);

                var dps = fr.getDps();

                lines = for ( i in dps)
                {
		    		Line
                    {
						startX: i.getX();
						endX: i.getX()+1;
                        startY: i.getY();
                        endY: i.getY()+1;
                        stroke: Color.rgb(i.getScaledRed(), i.getScaledGreen(), 200, 1);
                    }
                }
                theFrame++;
            }
        }
    ]
}

projector.play();

This saw improvements, however I could not run the timeline with less than 0.5s. I decided, with the new changes I could load the data into memory. I realsied I would have to do this incrementally, so I decided to define 10 sequences which would each hold 50 instance of the LoadStrings. Then I changed the animation which would resolve each LoadStrings, get the data and build the line. For each 50 it would move onto the next sequence and delete the previous sequence:

function loadDataIntoMemory():Void
{
    for (tf in [0..499])
    {
        var f: String = "{__DIR__}data/{tf+1}.csv";

        var fr = new LoadStrings(f);
        if (tf <50) insert fr into filePointer1;
        if (tf >50 and tf < 101) insert fr into filePointer2;
        if (tf > 100 and tf < 151) insert fr into filePointer3;
        if (tf > 150 and tf < 201) insert fr into filePointer4;
        if (tf > 200 and tf < 251) insert fr into filePointer5;
        if (tf > 250 and tf < 301) insert fr into filePointer6;
        if (tf > 300 and tf < 351) insert fr into filePointer7;
        if (tf > 350 and tf < 401) insert fr into filePointer8;
        if (tf > 400 and tf < 451) insert fr into filePointer9;
        if (tf > 450 and tf < 501) insert fr into filePointer10;

    }
}

The Timeline:

var projector4 : Timeline = Timeline
{
    repeatCount: 500
    framerate: 60

    keyFrames:
    [
        KeyFrame
        {
            action: function()
            {
                var fr;
                if (theFrame < 50) fr = filePointer1[theFrame];
                if (theFrame >= 50 and theFrame < 100)
                {
                    if (sizeof filePointer1 > 0)
                    {
                        delete filePointer1;
                    }
                    fr = filePointer2[theFrame-50];

                }
                if (theFrame >= 100 and theFrame < 150)
                {
                    if (sizeof filePointer2 > 0) delete filePointer2;

                    fr = filePointer3[theFrame-100];
                }
                if (theFrame >= 150 and theFrame < 200)
                {
                    if (sizeof filePointer3 > 0) delete filePointer3;

                    fr = filePointer4[theFrame-150];
                }
                if (theFrame >= 200 and theFrame < 250)
                {
                    if (sizeof filePointer4 > 0) delete filePointer4;

                    fr = filePointer5[theFrame-200];
                }
                if (theFrame >= 250 and theFrame < 300)
                {
                    if (sizeof filePointer5 > 0) delete filePointer5;

                    fr = filePointer6[theFrame-250];
                }
                if (theFrame >= 300 and theFrame < 350)
                {
                    if (sizeof filePointer6 > 0) delete filePointer6;

                    fr = filePointer7[theFrame-300];
                }
                if (theFrame >= 350 and theFrame < 400)
                {
                    if (sizeof filePointer7 > 0) delete filePointer7;

                    fr = filePointer8[theFrame-350];
                }
                if (theFrame >= 400 and theFrame < 450)
                {
                    if (sizeof filePointer8 > 0) delete filePointer8;

                    fr = filePointer9[theFrame-400];
                }
                if (theFrame >= 450 and theFrame < 500)
                {
                    if (sizeof filePointer9 > 0) delete filePointer9;

                    fr = filePointer10[theFrame-450];
                }
                var dps = fr.getDps();

                lines = for ( i in dps)
                {
                    Line
                    {
                        startX: i.getX();
                        endX: i.getX()+1;
                        startY: i.getY();
                        endY: i.getY()+1;
                        stroke: Color.rgb(i.getScaledRed(), i.getScaledGreen(), 200, 1);
                    }
                }

                theFrame++;
            }

        }

    ]
}

This allowed the script to run, for the first time, through 500 files showing the animation. It gobbles up the memory, and I know that this code can be improved, however I am not going to have any time this week so I thought I would post this update for the moment.

JavaFx used to represent BP Oil disaster news timeline

July 25, 2010

Outside of following the on-going Gulf Oil Spill, I have been playing around with JavaFX and different ways of displaying on-going information. I wanted a clean way to represent this information, and decided to try and create an interactive timeline of the news stories created around this event:

This is based on the BBC’s timeline index, where I have taken each story and mapped it to the relevant position on the timeline. Further to this, I grabbed each stories text and saved it into the XML file which the JavaFx script parses. When you hoover your mouse over an event, the text (or image) of the story “pops” up:

You have to close the “pop-up” to be able to view another event.

I was amazed at how many stories there were, and this isn’t all, and how crowded the screen became once I loaded all the stories into the XML file.

Anyway I thought I would share it, a slightly different take on presenting on-going information. Oh, I stopped at the 15th (in the hope it was finally some good news, and the fact you have stop somewhere).

Anyway let me know your thoughts, it is just an idea I was playing with. I plan to use it else where to represent different data over time (watch this space).

There are a couple of warnings with this, a, it is 1000 by 800 in size (so viewing it on a netbook doesn’t really work, as I have discovered), b, it might be slow to load because the server I have placed it on is quite a cheap one, c, it is only a prototype of an idea.