VirtualBox how to delete snapshots

December 14, 2011

Just a quick one, I wanted to clean up my VBox snapshots for an image, I am using the latest and greatest version. I suddenly realised it is not available in the VBox UI anymore. :-(

You can run it from the command line though!

First things first, lets say your image is called TestXP run the following:

vboxmanage showinfo TestXP

This will give you the UUID’s for your snapshots.

For each Snapshot that you want to delete enter the following:


vboxmanage snapshot TestXp delete UUID

Simple as that…. Prefer the UI method though :-)

Java Application connecting to a DB2 instance – Solution to Sql Exception No Suitable driver found

November 26, 2011

Ok, so this one bugged me for a while, and it was only a throw away comment on a forum  that gave me the clue.

I have written a simple Java app to connect to a DB2 database to query the data and perform some analytics on that data. However my connection would never work!!!

This was my code


private Connection connection = null;
private Statement statement = null;
private PreparedStatement preparedStatement = null;
private ResultSet resultSet = null;

try
{
res = new Vector();

Class.forName("COM.ibm.db2.jdbc.app.DB2Driver");
connection = DriverManager.getConnection("jdbc:db2://"+host+":"+port+"/"+database,user,pass);

statement = connection.createStatement();

preparedStatement = connection.prepareStatement("SELECT * FROM "+table+" order by 1");

resultSet = preparedStatement.executeQuery();

...
}
catch (Exception e)
{
throw e;
}
finally
{
close();
}

This is based on the article I had read from IBM. Further to this article, I had managed to locate db2java.jar from the client installation. However, everytime I executed the code I was getting the error:

Specified connection failed with java.sql.SQLException: No suitable driver found

Searching for this, most articles suggested the connection string was wrong. I knew it wasn’t because I checked and double checked.

Then I read a comment in a forum which said to use jcc.DB2Driver and not app.DB2Driver. I looked in the library db2java.jar and I could not find DB2Driver under jcc! However, I then realised within the driver directory of my DB2 client that there were more Java libraries, and one of these was db2jcc.jar.

I swapped over my libraries and updated my code (note the change to Class.forName):


   private Connection connection = null;
   private Statement statement = null;
   private PreparedStatement preparedStatement = null;
   private ResultSet resultSet = null;

   try
   {
      res = new Vector();

      Class.forName("com.ibm.db2.jcc.DB2Driver");
      connection = DriverManager.getConnection("jdbc:db2://"+host+":"+port+"/"+database,user,pass);

      statement = connection.createStatement();

      preparedStatement = connection.prepareStatement("SELECT * FROM "+table+" order by 1");

      resultSet = preparedStatement.executeQuery();

      ...
   }
   catch (Exception e)
   {
      throw e;
   }
   finally
   {
      close();
   }

This worked perfectly. Just wanted to share it!

Building a standalone Java command line app with Netbeans

November 26, 2011

Pretty simple one really, and it was – once I remembered / found out.

Seeing as I swap around from IDE to programming language, to OS I realised the other day that I didn’t know how to build a self contained command line java app (e.g. I had written a simple program and I wanted to ship it with the mysql driver) within Netbeans.

So after a bit of searching, and being put off by the prospect of using ant I discovered a way to do it… I thought I would write this in the hope I won’t forget and to help others (took a little while to find this answer).

Let imagine we have a simple Java program, good old hello world (see below)

public static void main(String[] args)
{
      Connection connection = null;
      Statement statement = null;

      try
      {
           Class.forName("com.mysql.jdbc.Driver");

           connection = DriverManager.getConnection("jdbc:mysql://localhost:3306/svm","username","password");

           statement = connection.createStatement();
      }
      catch(Exception e)
      {
           System.out.println("Oh boy - "+e.toString());
      }
      finally
      {
           try
           {
                 connection.close();
           }
           catch(Exception e)
           {
                 System.out.println("Oh by there is a problem! "+e.toString());
           }
      }

      System.out.println("Hello world");
}

And further to this, the simple program also connects to a MySQL database…

Now, you can see the desire… you to want to ship the jar including the mysql library? Well, this can be achieved really easily (in Netbeans). Just do the following:

  • Within the root directory of the project there is a file named – build.xml
  • Open this file in Netbeans, you will notice some XML at the top of the file about project name etc, within the open project element you see a lot of comments.
  • Scroll to the end of the file
  • Just before the closing element of project e.g. </project> insert the following example lines

 <target name="-post-jar" >
     <jar jarfile="${dist.jar}" update="true">
         <zipfileset src="${dist.jar}" includes="**/*.class"/>
         <zipfileset src="dist/lib/mysql-connector-java-5.1.6-bin.jar" excludes="META-INF/*"/>
     </jar>
 </target>

This is the specific entries for adding the MySQL library to your jar file. You will note that it is referencing the library from my /project/dist/lib/ directory.

And that’s it! Clean and build and you find your jar is bigger and contains your library!!!! Brilliant – ship it to the world now :-)

My source

Hadoop – examples of Java and Python Map and Reduce for the average word length

September 24, 2011

It’s been a while since my last post, which is a little disappointing. On the up side, the reason for my absence has not been because I am slothful, it is because I have started on a new project and I am deep in the world of ‘Big Data’ !!!!

Over the past three/four months I have been getting to grips with DB2, initially I couldn’t stand it but it is growing on me now. Especially when I switch back to MS SQL Server… But that is another conversation.

Excitingly on the horizon, in fact fast approaching, is Hadoop. So I have started playing with this technology, “joined” the Hadoop user group (well when I say join, I have started attending the meetings) and quite enjoying it all.

I thought I would share some of what I have learned so far, in the hope others will benefit. I am working through multiple exercises but I liked this one because it was fairly straight forward and yet showed you how to write your own Mapper and Reducer in Java and then in Python!

In the first instance I found an amazing blog / tutorial on setting up a Hadoop server and writing a mapper and reducer in Python. Both amazingly useful posts, really appreciate the publication of them.

Ok, lets get on with it. A simple exercise – the problem:

Write a MapReduce job that reads any Text input and computes the average length of all words that start with each character.

The example, we will use is:

Now is definitely the time

With the output looking like:

N 3
d 10
I 2
t 3.5

So I will provide you with two solutions for this, one in Java (the native language) and the other in Python.

I used Netbeans to write my Java code, in the first instance lets look at the main section, it is basically the same as the WordCount main you will see everywhere. In this case, yes I did hardcode the values – just for the sake of ease.

public static void main(String[] args)
    {
        JobClient client = new JobClient();

        JobConf conf = new JobConf(AvgLen.class);


        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(DoubleWritable.class);
        
        FileInputFormat.addInputPath(conf, new Path("avg/text.txt"));
        FileOutputFormat.setOutputPath(conf, new Path("avgout"));
        
        conf.setMapperClass(AvgLenMapper.class);

        conf.setReducerClass(AvgLenReducer.class);
        conf.setCombinerClass(AvgLenReducer.class);

        client.setConf(conf);

         try
        {
            JobClient.runJob(conf);

        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
        System.out.println("Done!");
    }

One thing to point out here is the use of the DoubleWritable class, because we need to use doubes, or longs I suppose, as part of our calculation rather than ints.

Now we have the Mapper:

public class AvgLenMapper extends MapReduceBase
        implements Mapper<LongWritable, Text, Text, DoubleWritable>
{
    private DoubleWritable oned = new DoubleWritable(1.0);
    private Text word = new Text();

    public void map(LongWritable l, Text t, OutputCollector o, Reporter r) throws IOException
    {
        String line = t.toString();

        int counter = 0;
        
        for (int i = 0; i < line.length(); i++)
        {

            if (line.charAt(i) == ' ')
            {
                word.set((line.substring(counter, i)).substring(0,1));
                oned.set(i-counter);
                counter = i+1;
                o.collect(word, oned);
            }

        }

        word.set((line.substring(counter)).substring(0,1));
        oned.set(line.length()-counter);
        o.collect(word, oned);
    }
}

Here we just collect the first letter of each word and the length of the word in our OutputCollector. The contents of this would be:

N 3
d 10
I 2
t 3
t 4

This is ready to be reduced. So lets look at the reducer:

public class AvgLenReducer extends MapReduceBase
        implements Reducer<Text, DoubleWritable, Text, DoubleWritable>
{
    public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException
    {
        double sum = 0;
        double counter = 0.0;

        while(values.hasNext())
        {
            DoubleWritable value = (DoubleWritable)values.next();
            sum += value.get();
            counter++;
        }
        output.collect(key, new DoubleWritable((sum/counter)));
        
        
    }
}

Again, notice the use of DoubleWritable over IntWritable. Here we reduce down the resultset from the mapper. Pretty simple really until we get to t which would involve 7/2 which = 3.5

With this done, and compiled. We just execute it with

hadoop jar AvgLen a b

then we look at our output with:

hadoop fs -cat avgout/part-00000

Nice.

So lets do the same but with Python!

Our Mapper:

#!/usr/bin/env python



import sys



for line in sys.stdin:



	line = line.strip()



	words = line.split()



	for word in words:



		print '%s\t%s' % (word[0:1], len(word))

Our Reducer:

#!/usr/bin/env python



from operator import itemgetter

import sys



current_word = None

current_count = 0.0

counter = 1.0

word = None



for line in sys.stdin:

	

	line = line.strip()



	word, count = line.split('\t', 1)



	try:

		count = int(count)

	except ValueError:

		continue



	if current_word == word:

		current_count += count

		counter = counter + 1

	else:

		current_count = current_count / counter

		if current_word:

			print '%s\t%s' % (current_word, current_count)

		current_count = count

		current_word = word

		counter = 1.0



if current_word == word:

	current_count = current_count / counter

	print '%s\t%s' % (current_word, current_count)

You can test your code, before involving hadoop with:

echo "Now is definitely the time" | /usr/local/hadoop/tmp/MyAvgMapper.py | sort | /usr/local/hadoop/tmp/MyAvgReducer.py

And if you are happy with the result, submit it to Hadoop with:

hadoop jar contrib/streaming/hadoop-*streaming*.jar -file /usr/local/hadoop/tmp/MyAvgMapper.py -mapper /usr/local/hadoop/tmp/MyAvgMapper.py  -file /usr/local/hadoop/tmp/MyAvgReducer.py -reducer /usr/local/hadoop/tmp/MyAvgReducer.py -input /user/hduser/avg/text.txt -output /user/hduser/pythonAvgTest

This is command just rolls off the keyboard doesn’t it :-)

Note I uploaded the file text.txt which contained the test text.

I hope this proves useful.

Further analysis of the scraped BBC data

May 4, 2011

Further to my previous post, I have been mining the data I collected to see if I could find anything of interest, and evaluate the data I have collected.

One of things that interested to me, was to look at the placement of the story compared to it’s position in the top ten over time. So I set off my program to scrape the data from 9AM through to 9PM, I didn’t want too much data to start with. Scraping this data every hour. I then spent a while writing SQL statements to capture the statistics I was looking for over the time period, below you can see a simple line graph (it is in Logarithmic scale):

I produced this using Open Office.

I find it is quite interesting to see the third position on the indexes doesn’t get utilised so much. You could also suggest as the day gets later the more people start reading other stories, i.e. features, second, and other stories. However, one can’t jump to any conclusion for two reasons which jump out at me:

1 – This sample set is way too small!
2 – I think I need more detail, it would be interesting (for example) which position of the features and analysis are being selected? The same goes for the other stories.

I decided I would cobble together a simple heat map of the index, aggregated, over the time period to see the “behaviour”. Feel free to take a look, personally I feel it re-enforces my previous points (I used JavaFx script 1.3 via Netbeans to produce this):

I think I might go back to the scraper program and see if I can expand the data gathered.


Follow

Get every new post delivered to your Inbox.