Saturday, 27 July 2013

London verses Sydney

This week has been horribly busy (for a number of reasons that will become apparent in the next couple of weeks), and this afternoon I am jumping on a plane to a conference on dwarf galaxies. The main reason for the conference is to give us a chance to celebrate the 60th birthday of astronomer
extraordinaire, Mike Irwin. I'll write more about this once I get back.

So, today, a quicky. I've spent the last 13 years living in Sydney, but did spend 3 years as an undergraduate in London. I was recently wondering about the relative sizes of the two, and wikipedia tells us that Sydney has a population of about 4.6 million, whereas London has a staggeringly accurate 8,308,369 residents (and bizarrely, this is a prime number). So, in terms of people, London is very roughly twice as large.

What about land area? Well, I saw this on a British Expats site recently
Possibly 20 times larger? I know London does feel large, but Sydney is no provincial town.

So, I decided to look, and found a nice website that let's you plot two maps on one scale. And here's you go

Now we have the question, just how big is a city? Cities have official boundaries, but these don't always correspond to density of people or houses etc. 

For London, we can take the concept of Greater London. This is approximately the land within the M25 motorway (but only approximately). From wikipedia, we can see that this looks like this
although there a lots of people not quite in London, living around the edge.

What about Sydney? Well Sydney does have a "Metropolitan Area"(although it doesn't seem to have too rigid a set of boundaries. In terms of local government, it looks like this.
But these only tell part of the story. Perhaps it's better to look at the map of the train network, and what trains are "suburban" as opposed to being intercity.

So, looking at our map of Sydney, it stretches from Penrith in the west, to the "city" in the east, Campbelltown in the south to Berowra in the north. Those dimensions are not dissimilar to the extent of London!

But again, that's not the entire story. It helps to look at the cities in terms of distribution of housing.
 Sydney has a lot of green! The link out to the west, through Plumpton and St Marys to Penrith is bounded by fields and farm land. Up in the north, there I live, the suburbs are intertwined with bushland. 

But there is no doubting the fact that London clearly is not 20 times the size of Sydney, although having lived there, and filed on to packed trains with millions of other people, I know that it can feel that way. Maybe a factor of two?  

And just to wrap up, here's Sydney compared to New York
and Cardiff (well, actually a large chunk of Old South Wales)
I could go on comparing places for ever, but have just remembered that I have a plane to catch!

Sunday, 21 July 2013

Who discovered dark matter in galaxies?

Many astronomers get their history of the subject from text books, and generally discoveries are presented as neat, tied up bags with statements like "X discovered Y". However, now and again, people look more deeply and the story is always more complex.

What brought this to my mind was a recent series of papers on just who should be credited with the discovery of the expansion of the universe. This was recently summarised in a wonderful article by Virginia Trimble called Anybody but Hubble! (I wish I could write as well as Virginia).
I recommend you have a read. History is messy.

I decided to look at the question of who discovered dark matter in galaxies as the textbook give a very "X discovered Y" answer to this, and gave a brief presentation at our Astro Morning Tea. But I am not a historian, and this is not complete, but just what I found after a couple of hours of looking around (thanks Brad for his input).

OK, let's start with dark matter. The first clues about the existence of dark matter was not in galaxies, but in clusters of galaxies (and this statement ain't that simple). The person generally credit is given to this man, Fritz Zwicky,
He was looking at the motions of galaxies in clusters and concluded that there is more mass present that indicated by visible light, and that "dark matter" was needed to hold the clusters together.

Zwicky was famous for being a prickly person, and (and I think I've mentioned this before) he is famous for inventing the phrase "spherical bastard"
but there is no doubt that he was a truly original thinker.

The textbook answer for galaxies beings in 1970 with the work of this famous astronomer, Vera Rubin,
with her collaborator, Kent Ford, she measure the speeds that stars were moving in our nearest cosmic companion, Andromeda galaxy, both in the centre, and then out into the disk. What did they find? Well, heres the rotation curve;
As you can see, stars are moving around at a few hundred kilometres per second. Importantly, however, as we go to larger distances, the speed doesn't drop. In fact, it stays roughly constant. If the only mass in Andromeda was associated with the stars, then it should drop off, so there must be more matter there than we can see.

With a decade's more work, Rubin had shown that Andromeda was not a one-off, but these constant rotation curves were seen in all of the spiral galaxies they looked at
and when we had radio observations even beyond the stars, we could also see that the gas was moving around that the same speed as the stars, at much larger distances.
There is dark matter in spiral galaxies. Lots of it.

But being in Australia, the picture is not so clear. Some of you may have heard that last year, astronomer Ken Freeman, won the (very much deserved) Prime Ministers Science Prize
 (Ken is the one in the middle :) The prize was for Ken's work in 1970, where (as the snippet from his paper shows) he discovered evidence for dark matter is galaxies, based upon how the light is distributed. Remember, this was the same time that Rubin and Ford were doing their work on Andromeda!

But reading Ken's paper reveals that he cites earlier work that suggested that there is more mass in galaxies than is visible, one of them from my own The University of Sydney. Here's the paper, from 1966.
The important number in there in the summary is the MT/L which is the total Mass to Light Ratio; if this was about unity, it would mean there is as much mass there as indicated by the light, but we see that in this galaxy, the lovely NGC 300, this ratio is almost 11!

The even more interesting thing, is that this paper cites an even earlier work, from 1961
who calculated that this value is 9, again indicating that there is more mass than is apparent from the distribution of the light.

At this point, I had to get back to work, and I didn't chase it further, but it should be clear that the answer to "who discovered dark matter in galaxies?" is not a clear cut question, with a straight-forward answer. This is not to take away from the great work done by Rubin and her collaborators (which is often the textbook and wikipedia answers), but just to illustrate that the actual history of scientific discovery is often messy.

If you are interested (and you should be) I recommend you have a read of these superb lecture notes by my colleague, Joss Bland-Hawthorn, especially the appendix that looks at this question in a lot more detail. It doesn't make things simpler :)

Saturday, 13 July 2013

Inferring the Andromeda Galaxy's mass from its giant southern stream with Bayesian simulation sampling

Back to research! And this week, it's a new paper from Mark Fardal. Mark is an expert in modelling galaxy collisions, and we've been looking at the big collision going on next door, the Giant Stellar Stream in Andromeda.

One of the tough things to do is measure the mass of an object. Measuring the mass of stars is possible as long as it is orbiting another star, or has a planet orbiting it. Basically, the laws of gravity and motion given to us by Isaac Newton means that we can unravel the masses involved.

However, measuring the mass of galaxies is much harder. Firstly, unlike stars, which can effectively be treated as point mass, the mass in galaxies is more complex. It is at its highest densities in the middle, but is roughly continuous, falling off as we head outwards. In this situation, using the laws of gravity and motion are messier to use.

There is another, bigger problem. That is that the majority of the mass in a galaxy is dark matter, which, while we can't see it, the motions of stars within a galaxy will be driven by dark matters gravitational pull. I've written before that it is very difficult to measure the distribution of dark matter out of the visible plane of stars seen in galaxies like the Milky Way as there are very few "tracers" (i.e. stars) out there whose motions we could use to measure mass.

This is where this new paper comes in. In the Milky Way, we have the Sagittarius Stream which orbits over the pole, and we can use it to measure the dark matter distribution. In Andromeda, we have the Giant Stellar Stream.

The problem with stellar streams is that they are torn apart dwarf galaxies, and the gravitational fields in which they move are quite complicated and orbits are nothing like planetary orbits. Here's what we think the Sagittarius stream looks like:
We have the same thing with the Giant Stellar Stream in Andromeda, although the orientation of Andromeda and the stream means that it can be a little harder to see. Here's a picture from the paper, showing data that we obtained more than a decade ago with the Isaac Newton Telescope. As you know from my posts on the PAndAS program, the lovely image of Andromeda that we are used to is buried down in the middle of the mess of stellar debris.
 In the left-hand image, you can see the Giant Stellar Stream sticking out from the bottom, but there are other features apparent. These are called "shelfs" and are basically loops like we can in the Sagittarius image above, but viewed roughly edge-on. So, if you try and think in 3D, you can see that the stream wraps around the centre of Andromeda.

So, how do we use this to measure the mass of Andromeda? Basically, you throw dwarf galaxies into Andromeda galaxies and see which reproduces the features we see above. In the right-hand panel are the patches of data that we want to compare to.

But, of course, we can throw dwarf galaxies into Andromeda, but we can on a supercomputer. And that's what we did, throwing in dwarfs on differing orbits, and into Andromeda galaxies with differing mass distributions, and looking at the debris. Here's a few examples -
 As you can see, we can reproduce the generic features of the structure we see on the sky, but there is more. We don't only have the view on the sky, but (through lots of other observations) we know the distance to points on the stream, and how fast it's moving. So, we need to ensure that our modelled dwarfs also reproduce these observations, which they do!
So, what do we find? We get a mass of
which is exactly where we'd expect it to be. Now, that might not sound impressive, but the mass of the Milky Way and Andromeda see-saws back and forth, up and down, and there is, what scientists like to refer to as "tension" (although when I hear the word, I can't help thinking of this man singing this song) between observations and expectations. Luckily, our result plonks things where you'd expect them to be).

But what of the dwarf that was broken up to form the giant stream? Is it completely destroyed? Well, we can look at out best model and ask where is the progenitor? And this is what we see
The progenitor, which badly shaken up, is still there, nestled into the outer disk region of Andromeda.

Enjoy it, it isn't going to last much longer. Well done Mark!

Inferring the Andromeda Galaxy's mass from its giant southern stream with Bayesian simulation sampling

M31 has a giant stream of stars extending far to the south and a great deal of other tidal debris in its halo, much of which is thought to be directly associated with the southern stream. We model this structure by means of Bayesian sampling of parameter space, where each sample uses an N-body simulation of a satellite disrupting in M31's potential. We combine constraints on stellar surface densities from the Isaac Newton Telescope survey of M31 with kinematic data and photometric distances. This combination of data tightly constrains the model, indicating a stellar mass at last pericentric passage of log(M_s / Msun) = 9.5+-0.1, comparable to the LMC. Any existing remnant of the satellite is expected to lie in the NE Shelf region beside M31's disk, at velocities more negative than M31's disk in this region. This rules out the prominent satellites M32 or NGC 205 as the progenitor, but an overdensity recently discovered in M31's NE disk sits at the edge of the progenitor locations found in the model. M31's virial mass is constrained in this model to be log(M200) = 12.3+-0.1, alleviating the previous tension between observational virial mass estimates and expectations from the general galactic population and the timing argument. The techniques used in this paper, which should be more generally applicable, are a powerful method of extracting physical inferences from observational data on tidal debris structures.

Saturday, 6 July 2013

Battle of the Bayes

A little Saturday morning mathematical interlude.

There has been a few posts on the interwebs about an article by Brad Efron, a professor of statistics at Stanford University. The article appeared in the prestigious journal Science and has the title "Bayes Theorem in the 21st Century".

Ted Bunn has a good discussion of the paper over on his blog, and I suggest you have a read before you read what's below. But the paper continues to smoulder what is known as the Bayesian-Frequentist debate. It's a long winded debate (that some think doesn't even exist), but it all is focused on the question of "How do by prior beliefs change in the light of new data?". As ever, xkcd explains it perfectly;
(To explain, the frequentist worked out the change of a false positive based on the role of the dice, concluding that a YES was unlikely. The Bayesian, however, also considers the prior information of the chance that the Sun had gone nova during the experiment - as the chance of this is extremely small, the Bayesian can be confident of the bet).

Efron, however, brings up the usual complaint against Bayesian analysis, namely that the prior information is subjective - how much I think something is true might be different to you (as we have different information) - and people should be wary of Bayesian results.

Ted Bunn notes, before his analysis, that he has never taken a statistics course, and so it not really qualified to comment on the thoughts of a professor of statistics from Stanford. I, however, am substantially more qualified (I have an A/O-level in statistics which I got in High School, aged 17 - I think I got a B), so, while I am not a professor of statistics, I'll give it a go.

Identical or non-Identical Twins!
The example given by Efron is a nice, simple question. Imagine you're pregnant and you go for a scan. The doctor tells you that you will be having two baby boys, and the question hits you "Will they be identical twins, or fraternal?" (I guess you need to know this because if they are identical, you can save money by buying identical sets of clothes and freaking people out)!
Efron has been using this particular example for a while and is based on a true story.

How do we work this out? Well, you need some information. The first first thing is the proportion of twins that are identical. As Efron points out, this is a third, so we can write
where the I means identical, and the I with the bar over it means "not identical".

So there's your answer, isn't it? The chance that your two boys are identical is one third?

No - there is a subtlety that we can't ignore, namely that identical twins can only be born in boy-boy or girl-girl combinations, whereas fraternal twins also have boy-girl combinations. So we can write this as
for the identical case, and
for the fraternal case. The notation in there in the brackets, such as p(A|B), means "the probability of A given that B is true".

So, how do we work out the probability that your twins are identical? We use Bayes rule which says

and we just plug everything in and find

So, the probability that the twin boys are identical is a half.

Everyone is happy with this, but Efron goes onto say that the problem is that we have used "prior" information" to make this calculation, namely the proportion of twin births that are identical. If we did not know this, we would assign an "uninformative prior" of there being a 50-50 chance of identical verses fraternal, and so we would get the above probability to be two-thirds, not a half, and hence wrong.

Because of this, Efron claims
 I read this a few times, and think it's quite, errrm (being polite) misplaced (and the work frequentistically is freaky).

Let's think about it
Until I read the Efron article, I had no idea what the rates of identical verses fraternal twins was. There was a time when no one on Earth knew what the rates were. At some point, people started to record the details of births, and we could work these out.

So, let's go back in time and pretend all we know is that we have twin boys, but have no clue to the relative numbers of identical and fraternal twins. We can assume that the relative fraction is between between zero (where there is no identical twins) and 1 (where all twins are identical) and, but we have no preference of any particular value, so we have what's called a uniform prior.

The rest of what follows is very similar to how one would test whether a coin is fair. Let's see of we can see how this works with the picture below.
To quote Miranda's friend Tilly, "bear with". The x-axis is the chance of identical twins, and the y-axis is the probability of that chance. The blue horizontal line is my "prior probability", which means that I think that any incidence of twins is likely. And as I know nothing else, that's it.

But then an old crone from the local village says to me that a woman just had twins and they were fraternal. Data, the sauce of science! What does it tell us? Well, clearly the chance of all twin births being identical must be zero. 

So, let's consider the range of incidence of identical twins in the picture. If I=0, then the chance of getting the data we observed is 100%. If I=0.5 the chance of getting the data is 50%, and as we said, if I=1 then the chance of seeing the data is zero; basically, that bit of data has turned our blue line into the green line.

The crone (sorry, not sure why it ended up being a crone) then adds "oh, and there was another birth of twins, and they were identical". The argument is now the same, but the argument is reversed e.g. if I=0, the chance of getting the data would be zero, and if I=1, then it would be 100%.

But we have two bits of information, so we multiply the probability distributions together and get the red curve. The chance for I=0 and I=1 are zero, and the probability peaks at I=0.5; it looks like there is a 50-50 chance for any particular birth to be identical.

Suddenly the crone says that she actually remembers many thousands of previous twin births, and starts to rattle off whether they were identical or not. As she does, you quickly add the data into our probability distributions and you notice that the distribution is becoming narrower and shifts away from having a centre at 0.5. After 10,000 remembered births, your distribution become the blue spikey curve,  highly peaked at I=1/3. What you've deduced is that probability that twins being identical is a about a third.

But we still need to answer the question "What the probability that your twins are identical, in light of the knowledge that they are two boys?"

So, we can take the Bayesian formula up there and write it in terms of the number of births we have recorded that are identical (we'll call that number r) and fraternal (call that n). I won't go though the maths here, but the probability that your hypothesis that the twins are identical is true is;

So, when we knew nothing, so n=r=0, this is two-thirds. This is the uninformative prior result noted by  Efron. What happens when we include the information from the crone? You get the following

The red dots are the recorded identical twins, and the blue dots are the recorded fraternal twins (x-axis should be number of reported twins plus one), and the green is the probability that your twins are identical. The first point starts off as two-thirds and bounces arounds as we collect more and more data, and approaches a half, as we expected from the above.

Wrapping it up
Hmm, this post has gotten to be much longer than I planned, so I think it's time to wrap this up.

I think what I want to reiterate is that I think Efron's comment is misplaced. Up there, I considered that I didn't know the chance of getting identical twins, but in reality, I could have easily expanded this to not knowing the chances of getting two boys as opposed to two girls, or even the relative shares in fraternal twins. The Bayesian framework is the way to continually update your beliefs in light new data.

OK - all of that may have seemed a little esoteric, but humans are Bayesians. For the local, here's an example based upon the race that stops a nation (I should point out, it doesn't stop me - I only like the Grand National).

Many people have a flutter, but don't know anything about the individual horses, and so are happy to be assigned a horse at random, or just pick one based upon something quirky, like the horse's name. If you are a little more racing-savvy (which I am not, I am not a gambling man), you might check out the odds - here, someone has looked at the history of the horses, and have assigned probabilities on how they think the race will be run; you will probably update your internal probabilities for betting on the horses.

However, if you are in the know, you might be privy to different information, such as the health of individual horses. Again, you will probably update your internal odds. As we have seen in Australia, often hiding prior information is considered naughty.

But the point is that there is no single "answer" to the probability distribution for the outcome of the race. Each person has a different distribution based upon their prior information. And, of course, this is not only in horse racing, but a huge range of human activities; the whole hoo-ha about insider dealing is that some people have information that make the playing field uneven.

In science, we start off with little bits of data and differing opinions on what they mean, but we continually collect data and update our probability distributions; this is why we continue to ask questions like is it really a Higgs Boson? The key point is that the data will overcome your prior distributions, and people agree on what we are seeing (although some people's prior distribution are so strong, no matter how much data we collect, they will cling to their ideas).

OK - it's now Sunday morning, and I am going to have some toast and promite. I am going to shamelessly finish this blog with a wonderful picture from Ted Bunn; controversial, but I can't help agree.
Thanks to the Gravitational Astrophysics Group and Brendon Brewer for invaluable input on this one.

Wednesday, 3 July 2013

Tough Crowd.....

I have just finished a major round of grant and postdoc reviews, and have been wiped out for a number of weeks. Back to science at last!

But last week I took some time out from reviewing and gave a talk on "My Life as an Astronomer" to the 2nd/3rd Pennant Hills Scouts Group.
(thanks to Buffalo for the use of the picture). We spoke about a lot of things, from what life is like as a professional astronomer, from the big bang to the oldest stars, life on planets at the centre of the Milky Way to neutrino telescopes under the ice in antarctica. It was a broad snapshot of the Universe as we know it.

The great thing about talking to kids about astronomy is the constant stream of questions (so many questions that the scout leaders had a hard time keeping everyone in check), and the fact that questions were deep, really wanting to understand what is going on. When the first question is "What would happen if the Universe had corners?", you know that you are in for an interesting time.

The kids were enthusiastic (and not just because astronomy is cool!), and hopefully they will think that a career in science is a good thing to consider (although reporting seems to suggest that science in schools often beats this out of them.)