Saturday, February 20, 2010

Genome-Wide Associations of SNPs to Viral Infection Susceptibility

I just skimmed over a paper in which a genome-wide association study was performed in order to identify genes that are involved in susceptibility to viral infection:

Genome-Wide Identification of Susceptibility Alleles for Viral Infections Through a Population Genetics Approach

I think that this work is very provocative, but I'd like to see a plausible functional basis for these genes to be involved in viral infection susceptibility.

These are the premises of their argument:
1. The number of distinct viral species in a geographic region should be correlate well with the amount of viral-driven selection in that region.
2. We have measurements of 660,832 single nucleotide polymorphisms that can be associated with geographic regions (specifically the HGDP-CEPH panel data).

They calculated Kendall's rank correlation (aka Kendall's Tau) for single nucleotide polymorphisms and their viral diversity data. They report 441 variants that survive a Bonferroni correction (alpha = 0.05), and notice that many variants are either in genes known to have a role in viral infection, or in genes for which such a role is plausible.

Potential Issues

They mention the possibility that the SNP data is simply correlating with another variable that they've not controlled for. They test calculate the same Kendall's rank correlation statistic for the following variables:
- average annual minimum temperature
- average annual maximum temperature
- short wave UV radiation flux

They found that none of the SNPs associated with virus diversity correlated with virus diversity associated with any of these variables. It's obviously impossible to perform this control on every possible confounding variable, but the concern of alternative explanations for some of these genes remains an open question. I would be surprised if all of these variants ended up really being involved in viral response. I would imagine that viral diversity in some regions could be heavily correlated with bacterial diversity, and some of these genes might be involved in response to bacterial infection.

They also mentioned that their study doesn't have statistical power to pick up some regions of the genome known to have small but definite effects on viral infection in humans, such as HLA.

Finally, until we have a functional understanding of what these genes are actually doing I don't think that we can be very confident in any of these genes. I am fairly confident that many of their associated SNPs are not directly related to viral infection. The fact that the SNPs are most likely only associated with a functional genetic difference means that there is a lot of work left to do.

To make the point more plainly, we can divide the genes from their study into two groups:

1. Genes that have a known involvement in viral interactions.
2. Genes that have no such interaction.

The genes in group 1 are good to see, but we already know about those genes. The real test of this data will be to see how many genes from group 2 end up having validated functional roles in viral infection. To really understand how many of these variants are really associated with viral infections we need more work. Generating more hypotheses to test is a worthy goal in science, but the jury is out about exactly how much we've learned from this particular experiment.