Statistical Analysis of NZ Parliament Bill Votes
This post will document how a statistical analysis of the 48th New Zealand Parliament bill votes can be conducted using the R statistics environment.
I’ll be reporting the results in another post, but you can follow the directions below to do the analysis yourself!
R is a free software environment for statistical computing and graphics. It was originally created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is now developed by the R Development Core Team. You can download and install R from here: www.r-project.org
Vote data csv
The data we will analyse is a matrix derived from votes of the 48th New Zealand parliament. It covers bill third reading votes and party votes for bills that were negatived at their first or second reading. The data is in a csv file formatted for import into R. You are welcome to use this data, but please attribute TheyWorkForYou.co.nz as the source (just in case I’ve made a mistake).
The header of the csv file shows the names of 110 bills (or parent bills), which were voted on by the 48th Parliament. Each vector in the matrix indicates how parties voted on the bill. If a party voted aye then a value of +1 is recorded in that party’s place in the vector. If they voted no then a -1 is recorded; if they abstained or did not vote then 0 is recorded. In the case of a split party vote, the percentage of party’s noes are deducted from their ayes, to give a fractional value between 1 and -1.
Loading data
First step is to install R. Once you have R installed, start the R console running.
Next save the data to a file on your computer. Then load the data file via the R console:
x <- read.csv('/path/third_reading_and_negatived_votes.csv', TRUE)
It should look something like this, e.g. first bill vector shown:
> x
Taxation..Annual.Rates.and.Urgent.Measures..Bill
ACT -1
Green 1
Labour 1
Maori Party 1
National -1
NZ First 1
Progressive 1
United Future 1 ...
Hierarchical cluster analysis
Now for the fun stuff. First off, you can run a hierarchical cluster analysis to see how parties voting groups them into clusters:
plclust(hclust(dist(x)))
Four of the parties were in the Labour government, which explains part of the clustering you can see in the plot.
Principal components analysis
For principal components analysis I found the reworking of R’s biplot function by Jose Claudio Faria useful.
Copy his biplot functions into the R console, from this post: https://stat.ethz.ch/pipermail/r-help/2007-June/133873.html The lines you want to copy and paste are:
#=============================================================================== # Name : biplot.s # Author : Jose Claudio Faria (DCET/USC/BRAZIL) ... copy everything until: #=============================================================================== # Name : biplot.s_to_learn # Author : Jose Claudio Faria (DCET/USC/BRAZIL)
If you enter the line of code below into the R console, it should render the first two principal components in a biplot graph in a new window:
biplot.s(x, center=F, scale=T)
The names of the bill variables in red are a bit distracting, we can change the colour like this:
biplot.s(x, center=F, scale=T, col.var='grey')
Now you should see the names of the parties positioned in the two dimensional plot in a way that explains the differences based on how they voted. A simple way to consider this is that 110 dimensions have been collapsed in to two.
> bp2 = biplot.s(x, center=F, scale=T, plot=F) > bp2$expl [1] 0.744
A loose interpretation is that the two dimensions shown on the plot explain 74.4% of the variance in the voting. You’ll see in the biplot that Labour and Progressive are on top of each other, this is because they almost voted identically. (To make the plot clearer, you can open the csv file, remove the line for Progressive’s votes, and reload the data in R).
3D with rgl package
We can see a 3d plot of 3 principal components using the rgl package:
require(rgl)clear3d()rgl.bringtotop(stay=T)biplot.s(x, center=F, scale=T, lambda.end=3, rgl.use=T)The plot has a lot of distracting detail due to the bill names, we can reduce this as follows:
biplot.s(x, center=F, scale=T, lambda.end=3, rgl.use=T, col.var='grey', var.factor=0.05)
Another way to improve rendering of the 3d plot is to remove the bill name headings line from the csv file, and reload it like this:
x <- read.csv('/path/third_reading_and_negatived_votes.csv', FALSE)row.names(x) <- x[,1]x <- x[,-1]rgl.bringtotop(stay=T)biplot.s(x, center=F, scale=T, lambda.end=3, rgl.use=T, col.var='grey', var.factor=0.05)
Adding a 3rd principal component has increased the explaination to 0.839:
> bp3 = biplot.s(x, plot=F, center=F, scale=T, lambda.end=3) > bp3$expl [1] 0.839
In my next post I’ll include images of the party vote analysis for those of you who haven’t had time to play with R. Hint: results show that statistically there are clearly more than two clusters of parties based on their party votes in parliament.

