As I discussed in my last post, I’ve decided to have a look at the climate data managed by the Australian Bureau of Meteorology (BOM).
I specifically want to look at two aspects:
- How different are the raw and adjusted data sets?
- What mechanism are used to adjust the data?
I’ve decided to limit my investigation to temperature data. The argument’s about global warming isn’t it?
Accessing the data
The BOM maintains two main temperature data sets.
The raw, unadjusted data is available through a system call Climate Data Online or CDO. Click here to look at the BOM's Climate Data Online
The second has the catchy name ACORN-SAT that has nothing to do with oak trees or satellites. Of course, it stands for ‘Australian Climate Observations Reference Network – Surface Air Temperature’. It contains the adjusted, homogenised temperature records. Click here to view the ACORN-SAT page at BOM.
Getting CDO data
After poking around the CDO pages for a while I came to a few disturbing realisations.
- The are not records for daily average temperatures. Remember that argument goes that the polar bears need sun block because the average temperature of the Earth is rising and it’s all our fault for burning fossil fuels. The BOM does not provide this data. Instead, it provides two separate sets of data: one for daily high temperatures and a separate one for daily low temperatures. I’ll assume the average is the sum of the high and low temperatures divided by two. I have computers. I can do this.
- I can only get the data for one weather station at a time. I have a choice between a cute map with dots for each weather station. If I click on a dot, I can get either the maximum or minimum temperatures for tat station for every day the station’s been in service.
- Fortunately, there’s a Weather Station Directory. Unfortunately, it lists 1,817 separate weather station, including several in Antarctica, Vanuatu and other islands around Australia.
- My other download choice is to enter a station number from the Directory and get the data that way. At one per minute, that’s around 30 hours for the minimum temperatures and another 30 hours for the maximum temperatures. At age 67, that’s too much of my remaining life expectancy.
- Once I’ve selected the data, I can download a ZIP file with all of the data in a Excel style comma separated value (CSV) file and a text file containing explanatory notes with things like the stations height above sea level, the state within Australia it’s in and the column layout of the CSV file.
- Each file has to be unzipped to extract these file. Then the individual CSV file need to be combined.
Fortunately, I’m a nerd and know several programing languages. After some mucking around I made and implemented the following decisions:
- I decided to put all of the data into a Microsoft SQL Server database. I’ve used this database, along with Access, Oracle, MySQL and others for various projects and tasks over the years and am quite comfortable with it.
- I’ll use Wolfram Mathematica for producing graphs and any complex calculations. No special reason other than I LOVE Mathematica.
- After downloading the Weather Station Directory, I used the SQL Server Import Wizard to load the directory into a SQL Server database I’ve created.
- I used a wonderful tool called iMacros to automate the download process. iMacros allowed me to create a separate CSV file with just the station ID numbers and feed it to a script that mimics the mouse clicks necessary to actually do the download. During the process Firefox, my web browser of choice crashed out a few times so the whole download process happened over about an eight hour period. Fortunately, there was little human intervention required other than restarting Firefox and cutting the station numbers that had already been downloaded out of the iMacros datasource CSV file.
- At the end of the process I had 3,465 files, somewhat less than the expected 3,634 files. I noticed while watching the process that sometimes either the minimum or maximum temperature data was not available for a particular weather station. I paused to ponder why a weather station wouldn’t record the minimum and maximum temperature. I failed to come up with an answer. It’s a weather station for heavens sake. What’s it there for if not to record the temperature?
- The next problem I faced, of course, was how to unzip 3,465 separate files. Fortunately, I use VB.NET as my primary development environment for commercial application. There’s a Windows component called Shell2 that allows extraction of files from zip archives. (Feel free to skip ahead if you feel your eyes glazing over. I’m recording this for other nerds who may wish to replicate my process. Feel free to post a comment if you want to request copies of my scripts and/or source code.)
- A few hours later, I had 6,390 files: 3645, text files and 3,645 CSV files with station data. In order to limit future analysis to mainland Australia, I was keen to add the state to the weather station table in the database. I also thought the elevation (height above sea level) might be useful too. Also, I wanted to look through the column definitions to make sure all the CSV files had the same layout.
- I modified the VB.NET program to read all of the text files, extract the state and elevation, update the weather stations table and check the layouts. While I was at it, I added the name of the notes files for each station to the station table. That way I have a method of telling which stations are missing temperature data. I’ll use only stations that have both sets of data.
- Now the only task was to actually load the CSV files into the database. Unfortunately the SQL Server Import Wizard isn’t made to load 6,930 files all at once. Another hour or two in VB.net added that functionality. Actual runtime was many hours, so I left it running overnight and went for a beer with a friend.
- Next morning I found I had a total of 16,999,270 minimum temperature records and 17,245,194 maximum temperature records.
- Total elapsed time was three and one half days. Total effort, about two days, maybe a bit less. I also required the use of a range of specialist tools that I happen to have at hand due to my profession. I also am less that flat out with paid work so I could afford to put in the time.
- The next steps involve performing a similar, but different set of extractions and imports of the ACORN-SAT data. Fortunately, there are only 120 of them. This raises the question of why, if Australia has aver 1800 weather stations, the BOM uses data from just 120 of them.
The BOM’s facility for accessing Climate Data Online works, no doubt about that. My previous concern that it was designed to confuse rather than enlighten is, unfortunately confirmed.
My next post will take you through the ACORN-SAT process. Then we can get on with looking at the original questions of “why the adjustments” and “how are the adjustments made?”
I’ll finish with a view of my first attempt at generating a map with Mathematica. It shows a map of Australia and surrounds with a red dot at the location of each weather station.
I’ve cropped lots of the outlying stations like those in Antarctica.
Not bad for a first effort, if I say so myself!