Monday, December 17, 2007

The significance of Significance

Did you ever wonder why we always collect 10,000 events in our files? Why not collect less? Or more? Well, to answer this question, you'll need to take a step back and stretch your brain to remember that pesky stat class you took as an undergraduate. You may remember that French mathematician with the funny name, Poisson. Well Poisson's distribution described the probability of a number of events occuring in a fixed period of time IF the events occur with a known average rate AND are independent of the time since the last event (don't believe me, check for yourself on Wikipedia). With that in mind, we'll need to assume that the events going through the flow cytometer pass through the instrument at a given and stable rate in order to allow us to apply the Poisson distribution to our flow data. Not a very difficult assumption knowing that our sample is pretty evenly distributed in a fluid, and that fluid passes through the instrument at a given flow rate. So, applying Poisson, the random and discrete occurrences or "arrivals" are the cells passing by the laser intercept, one by one. Also, Poisson tells us that if N events, or "cells", are observed or "collected" then the standard deviation (SD) associated with the collection is the square root of N. Brilliant! Additionally, we can express the SD in a more friendly term, the coefficient of variance (CV), which is 100 x SD/N. The CV is will give us a percent variance of our distribution. Very tight distributions will have a low CV, and very broad distributions will have a large CV.

Now, how is all this pertinent to flow cytometry. Well, lets say you collect your 10,000 cells, and you are analyzing a subset within that 10,000 cells that is at a frequency of 10%. This means, you've collected 1000 cells in your subset. The variance you want to measure and test the significance of is the 1000 cells in your subset, so, Poisson tells us that the SD of this population is equal to the square root of 1000, which is 31.62. If the SD is 31.62, then the CV is 100 x SD/1000 or 3.16%. This means that the accuracy of the frequency you report on the 1000 cells you collect has a variance of 3.16%. If you would like your CV limited down to say, 1% variance, then you would need to collect a data file of 100,000 cells, giving you a subset of 10,000. Given this simple equation, you can calculate ahead of time just how many cells you need to collect in order to achieve a certain amount of variance. A nice table and description is available at the following link from the folks at Cardiff University in the UK. So, in answer to our 1st question, we need to collect as many cells as will allow us to derive conclusions from our data that is backed up by significance.