RDF on cloud number nine

The goal of this Studien or Diplomarbeit is to investigate how well the existing cloud computing offerings of Amazon (in particular SimpleDB and the Elastic Compute Cloud) can be harnessed to process very large amounts of RDF data. The goal is to make retrieval over at least 30 million and ideally 500million triples possible and easy (a real life dataset of this size is available).


This topic will be challenging in many ways, you will have to deal with a novell computing paradigm for which little literature or settled APIs are available. Dealing with large amounts of data will challenge you to program effectively and muli-threaded - even if its just to get the data on to Amazons servers. But this topic is also rewarding because you can work with the cutting edge of two important internet technologies and - should processing RDF on the cloud work well - will find lots of people interested in your work.

For this topic I'm looking for students with a solid programming background, you will need some understanding of how to program fast mulit-threaded programs (preferrably in Java, other languages such as C, Python or Erlang are possible, however, mean that I won't be able to help with detail programming problems). You will need to be quick in understanding and using APIs, and shouldn't be afraid of linux (using the ElasticComputeCloud will demand that you remotely configure and use linux installations). You need to be able to read english documentations and to converse fluently in English or German.

Besides the interesting topic we offer a nice working atmosphere, monthly student networking events and a supervisor enthusiastic about the topic. 

Kontakt:

Valentin Zacharias

IPE - WIM

Tel.: 07219654806