The Data Cube – a new way to handle big data for a big country (and beyond)

Written by Mark Engler, Embassy of Switzerland in Australia

Imagine you are an organisation that relies on data to do analytical work for a multitude of clients. Logically you would collect the data you need to produce the best possible analysis. Over the years you collect more and more data and your client’s requests become more and more complex. As the data piles up you eventually realize that you might have all the data you need to answer a request, but that locating it, extracting it and converting it to something you can work with will cost an enormous amount of time and money before you even get to start the interesting analytical stuff. So you face the prospect of telling your client that he can only get what he requires is he is willing to pay millions of dollars and wait several months. But new tools, to overcome this issue, are now being developed, and their application will be discussed in this article.

Data Cube

This is exactly the situation Geoscience Australia, an Australian Government listed entity within the Industry and Science portfolio, found itself in. Stored in the archives beneath the building was 35 years’ worth of Landsat satellite imagery of Australia with a resolution of 25 meters. If you wanted to do a specific analysis you had to request every picture individually. A robot would then dive in to the archives and return with the image before going back in to get the next one. In order to be able to work with the pictures you would then still have to calibrate them (remove clouds, adjust for different light situations or the 2 meter shift of Australia to the north since the first picture was taken). Using this system, a set of let’s say 1200 scenes to observe the development of one specific area over time would take 3-4 months to prepare before any analysis could even start. An analysis of all of Australia (with about 300’000 scenes) would take 15+ years!

To add to the problem, there are new satellites in the pipeline that will provide even more regular and precise data. One is the geostationary Japanese Himawary 8 which is to provide updates every 10 (!) minutes. Another one is the European Sentinels which will provide 10 meter resolution pictures every 10 days. While these new possibilities open up exciting new opportunities for analysis it is clear that the current data management system would not be able to handle it. Just the current Landsat data was already using up several petabytes of data with one petabyte equalling a pile of CD-ROMs the height of Mount Everest.

The solution that Geoscience Australia (in collaboration with CSIRO and the Australian National University National Computational Infrastructure) came up with is the Data Cube. Creating the Cube implied physically moving the entire data archive to the national supercomputer at the National Computational Infrastructure which is Australia’s national research computing facility and the Southern Hemisphere’s fastest supercomputer. All the images were then digitalized, standardized and calibrated so that they were ready to use and could be immediately analysed to fit a client’s needs. The data is openly accessible and clients can either do an independent analysis by developing their own algorithms or work with the experts of Geoscience Australia. Data access is free of charge and the software involved is open sourced. According to Geoscience Australia this open approach and the research benefits flowing from it add much more value than what any organization could hope to earn by charging for access to the data. Following the same logic the European Commission has also decided to make access to the Sentinel data open and free. This is of course good news for data users, but bad news for commercial providers of satellite imagery who could soon find themselves out of business.

With the Date Cube and using the power of the supercomputer the time for preparing the 1200 slides to analyse one area has come down to 15 minutes with the 300’000 scenes for all of Australia now taking only 3 hours. This opens up enormous new possibilities just using the Landsat data and even more when including the data from the new satellites. The system is also capable of eventually integrating radar imagery and even aerial photographs to be able to go back further in time. Other organizations can also link in their data archives to broaden the analytic potential of the Cube. The Data Cube can for example be used to track water flows and usage, coastal erosion, land management, urban development or even shallow water bathymetry as the satellites can see up to 30 meters below water. According to Geoscience Australia the 10 minute updates of the Himawary 8 should even make it possible to track the contrails of individual aircraft – something that would have been extremely helpful in the still ongoing search for flight MH 370.

The Data Cube is a pioneering effort and a world’s first. Geoscience Australia is in discussion with partners from all over the world and many of them are very keen to set up something similar. As it is all open source, Geoscience Australia is happy to share and collaborate and has for example signed an MOU with the European Commission. The same algorithms to analyse water management in the Australian Outback could for example be used to analyse areas in Europe, Central Asia or South America. One great advantage of satellite imagery is that you do not rely on collecting data on the ground which can enormously reduce costs and become especially important when analysing fragile parts of the world where putting researchers on the ground would simply be too dangerous.

For more information on the Data Cube go to Geo Science Australia where you can also watch a short introductionary movie or go explore for yourself by looking at Water Observations from Space. If you are interested in knowing more or would like to engage with Geoscience Australia, the Science, Technology and Education Office of the Swiss Embassy in Canberra is happy to put you in touch.

This entry was written by Mark Engler, the STC in Canberra, Australia. You can get in touch with Mark under mark.engler@eda.admin.ch

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s