Saturday, December 25, 2010

Resolving Debates in Statistics

Occasionally there are debates in science about what models are formally appropriate for thinking reasoning about scientific concepts. These include:

- The existence of race as a meaningful concept in light of current knowledge about human genetic diversity. See Richard Lewontin's article "The apportionment of human diversity" (1972) and Lewontin's Fallacy for more information.
- Quantum and classical physics.
- Bayesian and Frequentist methods for statistical inference.

Scientists have chosen sides in these debates based on education, philosophical preference, and sometimes practical concerns. Usually these debates can be reduced down to the appropriateness of certain calculations in solving scientific problems. For example the Bayesian-Frequentist debate largely centers on the incorporation of prior information into inferential calculations. In "Confidence Intervals vs Bayesian Intervals," by ET Jaynes suggests that all of these debates over methods can be resolved by benchmarking methods on a set of focused problems:

I suggest we apply the same criterion in statistics: the merits of any statistical method are determined by the results it gives when applied to specific problems. 

Saturday, December 11, 2010

Sometimes Big Changes Are Better

In college, I was never formally taught the art of debugging. Often in my first few programming classes, I'd end up rewriting chunks of code that weren't giving proper results, as big changes can occasionally be easier than small changes. I'm also a fast typist, so the cost of typing a few hundred lines is relatively low for me.

I always felt strange about this, because it seemed like making big changes to the code in this way is a poor practice. Interestingly, that Peter Norvig occasionally does the same thing, and doesn't even need to understand the details of the bug in order to fix it.  Here's an excerpt from Peter Seibel's excellent book Coders at Work:

Seibel: On a different topic, what are your preferred debugging techniques and tools? Print statements? Formal proofs? Symbolic debuggers?

Norvig: I think it's a mix and it depends on where I am. Sometimes I'm using an IDE that has good tracing capability and sometimes I'm just using Emacs and don't have all that. Certainly tracing and printing. And thinking. Writing smaller test cases and watching them go, and breaking the functionality down to see where the test case failed. And I've got to admit, I often end up rewriting. Sometimes I do that without ever finding the bug. I get to the point where I can just fell that it's in this part here. I'm just not very comfortable about that part. It's a mess. It really shouldn't be that way. Rather than tweak it a little bit at a time, I'll just throw away a couple hundred lines of code, rewrite it from scratch, and often the bug is gone.

Sometimes I feel guilty about that. Is that a failure on my part? I didn't understand what the bug was. I didn't find the bug. I just dropped a bomb on the house and blew up all the bugs and built a new house. In some sense, the bug eluded me. But if it becomes the right solution, maybe it's OK. You've done it faster than you would have by finding it.

This is a really smart way to think about debugging. We don't have infinite time to understand the minute details of some mistake made in the past. What's important is to get things working as quickly as possible. As long as the code has the desired functionality and passes tests appropriately, sometimes rewriting big chunks of code can be an effective debugging method.

Monday, December 6, 2010

Using a Closure to Make Panel Labels in R

When making plots with several panels, I very often want to label the individual panels with letters of numbers. I like to have a simple function for labeling panels, like this:

# create a plot
plot( x, y )
# assign a unique label to the plot

Below the break is a definition for the panel_labeler() function using a closure in R. Shown are some examples of using panel_labeler() assign labels to plots. The plots show different methods for combining mixtures of Normal distributions, but this approach could be used for plotting anything.

Wednesday, December 1, 2010

Infrequently Asked Questions about Perl

I just read a few entries from Infrequently Asked Questions about Perl, a hilarious list of joke answers to serious questions by Mark Jason Dominus. Here are some of my favorite entries:

How do I get tomorrow's date?

Use this function:
sub tomorrow_date {
          sleep 86_400; 
          return localtime();

How can I find out whether a number is odd?

sub odd {
            my $number = shift;
            return !even ($number);

How can I find out whether a number is even?

sub even {
            my $number = abs shift;
            return 1 if $number == 0;
            return odd ($number - 1);

Saturday, October 23, 2010

Action Shots of Water Ballons

Just saw a great article on NPR's web site about Edward Horsford's photographs of water balloons as they burst. The photos are brilliant, they capture the rupturing plastic and swirling water as the balloon breaks. 
On his Flickr stream are several other fun shots, including a detailed close-up of a spider and another of a snowflake.  

Saturday, October 16, 2010

Illegal Manipulation of Automated Trading Systems

From this CNBC Article:
Two Norwegian day traders have been handed suspended prison sentences for market manipulation after outwitting the automated trading system of a big US broker.
The two men worked out how the computerized system would react to certain trading patterns – allowing them to influence the price of low-volume stocks.
This could become a more serious problem as trading systems become more automated, and I suspect that there are cases similar to this one that we never hear about.

One argument made by the defense is interesting: the defense lawyer admits that his clients misled the trading algorithm in order to make a profit, but weren't responsible for the actions of the computer. They're appealing the case.

Wednesday, August 18, 2010

String Lengths in R

For some reason, in R this does not do what I expect:

string = "mystring"
print( length(string) )
[1] 1

What I actually want is this function:

print( nchar(string) )
[1] 8

Thursday, July 22, 2010

An Idea Whose Time Has Come

If you enjoy literature, politics and satire, you'll love ShakesPalin.

After inventing the word "refudiate" and subsequently comparing herself to Shakespeare as a wordsmith, Twitter is abuzz with posts bizarrely fusing quotes from the Bard on Avon and the Thrilla From Wasilla. Here are some of the better ones:
Alas, poor Couric, I read them all.
But soft, what light from yonder window breaks? It is the East, and I can see Russia from my front porch.

Here's my personal favorite:
Be not afraid of mavericks: some are born mavericks, some achieve maverickry, and some have maverickness thrust upon 'em.

Sunday, June 27, 2010

Scientists Drawn By Seventh-Graders

Children draw and describe their perception of scientists before and after visiting FermiLab. Quoting from the project description page:

Few young people have never met a scientist or engineer and most have an opportunity to see where they work. In 1985, middle school teachers asked us if they could bring their students to Fermilab. They helped us develop Beauty and Charm so that those students who came to a working research laboratory would come with a purpose.

Wednesday, May 26, 2010

Roger Ebert is a sarcastic genius

Some choice words from Roger Ebert's review of Sex and the City 2:

Some of these people make my skin crawl. The characters of "Sex and the City 2" are flyweight bubbleheads living in a world which rarely requires three sentences in a row. Their defining quality is consuming things. They gobble food, fashion, houses, husbands, children, vitamins and freebies. They must plan their wardrobes on the phone, so often do they appear in different basic colors, like the plugs you pound into a Playskool workbench.

It doesn't slow down from there. My favorite thing about this movie is the contempt and sarcasm that it inspires in Mr. Ebert.

Monday, May 10, 2010

R Graphics Parameters That I like

Reproduced here so that I don't lose them:

par( lwd = 1.5 )
par( mgp = c(3,1,0) )
par( mar = c(4,5,3,2) )

Sunday, May 2, 2010

Rethinking Paper Money

Michael Tyznik has a great post in which he suggests new designs for American currency:

Graphic Design: Dollar Redesign by Michael Tyznik.

The best part is that rather than filling the reverse side of bills with a smorgasboard of random eyes and famous buildings, Tyznik's designs use tasteful pictures of important people in history and famous quotes from them.

Friday, April 23, 2010

Refusing to Stop

I just finished reading a story about Jure Robic, a Slovenian soldier who excels at ultra long-distance bike races. His races are characterized by mental instability, he frequently hallucinates and uses his hallucinations to motivate himself to work harder. It's very compelling to read the stories of how he and his racing team manage the intense demands that the racing exerts on both his body and his mind.

From Danny Coyle's article in the New York Times titled
That Which Does Not Kill Me Makes Me Stronger:

In a consideration of Robic, three facts are clear: he is nearly indefatigable, he is occasionally nuts, and the first two facts are somehow connected. The question is, How? Does he lose sanity because he pushes himself too far, or does he push himself too far because he loses sanity? Robic is the latest and perhaps most intriguing embodiment of the old questions: What happens when the human body is pushed to the limits of its endurance? Where does the breaking point lie? And what happens when you cross the line?

The brain is a machine made up of hundreds of billions of interconnected neurons, and when machines are pushed to their limits they can begin to malfunction. This story make me wonder how much potential exists in all of us to push harder in our jobs and our hobbies, and what the cost of such exertion would be.

Thursday, April 15, 2010

Duffy-negative humans at risk from Plasmodium vivax

There is a blood group called Duffy-negative that provides increased resistance to infection by Plasmodium vivax, the parasite that causes malaria. The Duffy antigen is a protein that can't displayed on the surface of blood cells, and people in the Duffy-negative blood group do not display the Duffy antigen. It has been known for many years that vivax enters blood cells by leveraging the Duffy antigen, and the absence of this Duffy antigen on the surface of blood cells makes it more difficult for the parasite to enter blood cells.

To understand this paper better, I looked up some statistics about malaria and they were very shocking. There are between 350-500 million cases of malaria per year, and one to three million people die per year from malarial infection. There is still not an effective vaccine for malaria, although the Gates Foundation is funding work in this area.

A recent paper in PNAS gives evidence that vivax may be evolving its way around the requirement for Duffy:

Plasmodium vivax clinical malaria is commonly observed in Duffy-negative Malagasy people

The striking sentence from their abstract is this:

"In Madagascar, P. vivax has broken through its dependence on the Duffy antigen for establishing human blood-stage infection and disease."

The authors investigated popluations of people in Madagascar without malarial symptoms, 72% of whom are in the Duffy-negative blood group. Using PCR and microscopy, the authors found that 8.8% of people Duffy-negative patients were infected with vivax. After isolating parasite samples from these people, polymorphic markers in the parasites were genotyped to see if this ability to infect Duffy-negative people was from a single clonal population. The markers suggested that the parasites infecting Duffy-negative people were from distinct genetic backgrounds, which is consistent with the idea that the ability to infect Duffy-negative humans has arisen multiple times in evolution.

While the authors don't know the molecular basis of infection for these parasites with novel infection mechanisms, that will certainly be the focus of new work. The paper is very interesting, and has very important implications for how we treat one of the most devastating diseases in the world.

Unfortunately, people must be subscribers to PNAS to read this article. Here's a link to a recent ScienceDaily news article summarizing the study:

Duffy-Negative Blood Types No Longer Protected from P. Vivax Malaria

Tuesday, April 13, 2010

Catholic Church Forgives Beatles

The headline says it all. Here is the story.

Monday, March 22, 2010

High-Adventure Commuting, Sometimes in a Cloth Bag

The children in  this article have a much more beautiful and exciting commute to school than I do. The best part of the article is this picture caption:
School run: Nine-year-old Daisy Mora makes the trip every day to get to lessons, with her five-year-old brother riding in a cloth bag.
How many older siblings have wanted to put their younger brothers and sisters into a cloth bag?

Friday, February 26, 2010

Common Math Symbols in HTML

From John D. Cook's web page: a table of common symbols in mathematics and their corresponding HTML entity entries.

An alternative for math nerds:
(1) ∃ a page of that contains a table of HTML codes for common symbols used in math.
(2) Symbols in table ⊂ all math symbols.

Local Variables in Perl

Many programming languages allow you to associate variables with a scope. The scope of a variable defines the context in which a variable can be accessed and manipulated. In Perl, this is how you declare a local variable inside of a scope:

 my $x;

This means that the scalar variable $x can't be accessed outside of the braces. However, this is also valid Perl:
 local $x;

Perversely, this does not create a local variable named $x. This actually does this:
  • save the current value of $x (outside the braces) somewhere safe.
  • make a new variable named $x that can be used inside of $x.
  • when the scope of the new variable is cleared, replace $x with the old value.

This is a little complicated, so here's an example that illustrates the behavior of local:

$x = 3;
 local $x = 'foo';
 print $x, "\n";
print $x, "\n";

The output of this program is:


In general, you should use my much more frequently than local, and overuse of
local can be considered a code smell. Read more about my and local in Coping with Scoping.

Tuesday, February 23, 2010

British to pay for placebo treatments out of pocket

A committee of British MPs have recommended that the British National Health Service discontinue homeopathy as a routine treatment for health issues. From an article by Andy Coghlan in the New Scientist

In preparing its report, the committee, which scrutinises the evidence behind government policies, took evidence from scientists and homeopaths, and reviewed numerous reports and scientific investigations into homeopathy. It found no evidence that such treatments work beyond providing a placebo effect.

This seems like a strong victory for the reality-based community. There are the usual complaints from people supporting homeopathy, such as:
Michael Dixon, medical director of the foundation adds: "Science is a vital tool in healthcare, but so are compassion and caring and treating patients with dignity. It is not clear that the Committee took that into account."
I'm not sure why ineffective remedies are supposed to be compassionate, and I think that the burden of proof rests with supporters of homeopathy to make the case that ineffective remedies are still useful.

Monday, February 22, 2010

One-Star Reviews for John Scalzi

John Scalzi is one of my favorite authors, I think he does great work as an author and as a blogger. He has a new post up on Whatever about One-Star Reviews. The basic idea is that he's found reviews of books on that give a book the lowest possible rating and a detailed description of why such a rating is deserved. John posts such reviews of two books currently nominated for awards, and then says this:
I think it’s useful for all us writers to remember no one work pleases everyone, and you can’t make anyone like it if they don’t, and you can’t keep them from telling other people what they think of it, even if they hate it… and that’s fine. Learn to deal with it. Otherwise it doesn’t matter how much success or praise or satisfaction you earn through your writing, you’ll still obsess over those one-star reviews and it will eat away at your joy. That’s no way to live.
I have a strong perfectionist streak, and this is important advice for people like me. If you want to do good work, you need to work very hard. You need to avoid being overwhelmed by extreme amounts of praise or criticism, because both are distractions from what you're really trying to accomplish. Most of all, you should probably remember that in some ways criticism is partially praise - something about your work has drawn people in enough to give it a chance. Even though some people might not end up liking your work, work that everyone likes is bland, banal and boring.

Saturday, February 20, 2010

Genome-Wide Associations of SNPs to Viral Infection Susceptibility

I just skimmed over a paper in which a genome-wide association study was performed in order to identify genes that are involved in susceptibility to viral infection:

Genome-Wide Identification of Susceptibility Alleles for Viral Infections Through a Population Genetics Approach

I think that this work is very provocative, but I'd like to see a plausible functional basis for these genes to be involved in viral infection susceptibility.

These are the premises of their argument:
1. The number of distinct viral species in a geographic region should be correlate well with the amount of viral-driven selection in that region.
2. We have measurements of 660,832 single nucleotide polymorphisms that can be associated with geographic regions (specifically the HGDP-CEPH panel data).

They calculated Kendall's rank correlation (aka Kendall's Tau) for single nucleotide polymorphisms and their viral diversity data. They report 441 variants that survive a Bonferroni correction (alpha = 0.05), and notice that many variants are either in genes known to have a role in viral infection, or in genes for which such a role is plausible.

Potential Issues

They mention the possibility that the SNP data is simply correlating with another variable that they've not controlled for. They test calculate the same Kendall's rank correlation statistic for the following variables:
- average annual minimum temperature
- average annual maximum temperature
- short wave UV radiation flux

They found that none of the SNPs associated with virus diversity correlated with virus diversity associated with any of these variables. It's obviously impossible to perform this control on every possible confounding variable, but the concern of alternative explanations for some of these genes remains an open question. I would be surprised if all of these variants ended up really being involved in viral response. I would imagine that viral diversity in some regions could be heavily correlated with bacterial diversity, and some of these genes might be involved in response to bacterial infection.

They also mentioned that their study doesn't have statistical power to pick up some regions of the genome known to have small but definite effects on viral infection in humans, such as HLA.

Finally, until we have a functional understanding of what these genes are actually doing I don't think that we can be very confident in any of these genes. I am fairly confident that many of their associated SNPs are not directly related to viral infection. The fact that the SNPs are most likely only associated with a functional genetic difference means that there is a lot of work left to do.

To make the point more plainly, we can divide the genes from their study into two groups:

1. Genes that have a known involvement in viral interactions.
2. Genes that have no such interaction.

The genes in group 1 are good to see, but we already know about those genes. The real test of this data will be to see how many genes from group 2 end up having validated functional roles in viral infection. To really understand how many of these variants are really associated with viral infections we need more work. Generating more hypotheses to test is a worthy goal in science, but the jury is out about exactly how much we've learned from this particular experiment.

Dolphins Can Manage Diabetes

Stephanie Venn-Watson has just presented data suggesting that dolphins have the ability to switch on and off type II diabetes depending on the availability of food.

"By taking regular blood samples of the dolphins, she discovered that they could induce type II diabetes at times of fasting and then almost immediately turn it off again when food became available."

Here is the full news story by the Richard Alleyne, published in the Telegraph. I'm very interested in reading the actual article. I'm guessing that the dolphins can somehow control their insulin production (some forms of type II diabetes affect your response to insulin in addition to production), and I wonder at what level this regulation is happening. I also wonder if there are other species that have this ability, as it's likely to be a generally useful trick for mammals. Perhaps there are even humans whose insulin production varies in and out of diabetic states in response to food scarcity.

I'd like to see more functional data on how this works, but it's a very interesting idea.

Amy Bishop - Strange and Deluded

Amy Bishop is clearly a disturbed individual who tragically attacked her colleagues with a gun, killing three and wounding several more. The more that I learn about her story, the stranger the picture of her situation becomes. Apparently on one of her papers she lists her husband and their three children as co-authors.

From Mary Agnes O'Connor at Shepherds and Black Sheep:

"Evidence strongly suggests that Dr. Bishop used her husband, her family and by all appearances the sham 'Cherokee Labsystems' to fabricate a record of recent accomplishments. Her use of essentially an online vanity publisher further diminishes her professional stature.

It should have been no surprise to Dr. Bishop that the University easily saw through the smoke and mirrors and that she would not receive tenure. But an oversized ego can be blinding."

The blog post is worth reading if you're interested in the life of this neurobiologist-turned-murderer.