Saturday, December 25, 2010

Resolving Debates in Statistics

Occasionally there are debates in science about what models are formally appropriate for thinking reasoning about scientific concepts. These include:

- The existence of race as a meaningful concept in light of current knowledge about human genetic diversity. See Richard Lewontin's article "The apportionment of human diversity" (1972) and Lewontin's Fallacy for more information.
- Quantum and classical physics.
- Bayesian and Frequentist methods for statistical inference.

Scientists have chosen sides in these debates based on education, philosophical preference, and sometimes practical concerns. Usually these debates can be reduced down to the appropriateness of certain calculations in solving scientific problems. For example the Bayesian-Frequentist debate largely centers on the incorporation of prior information into inferential calculations. In "Confidence Intervals vs Bayesian Intervals," by ET Jaynes suggests that all of these debates over methods can be resolved by benchmarking methods on a set of focused problems:

I suggest we apply the same criterion in statistics: the merits of any statistical method are determined by the results it gives when applied to specific problems. 

Saturday, December 11, 2010

Sometimes Big Changes Are Better

In college, I was never formally taught the art of debugging. Often in my first few programming classes, I'd end up rewriting chunks of code that weren't giving proper results, as big changes can occasionally be easier than small changes. I'm also a fast typist, so the cost of typing a few hundred lines is relatively low for me.

I always felt strange about this, because it seemed like making big changes to the code in this way is a poor practice. Interestingly, that Peter Norvig occasionally does the same thing, and doesn't even need to understand the details of the bug in order to fix it.  Here's an excerpt from Peter Seibel's excellent book Coders at Work:

Seibel: On a different topic, what are your preferred debugging techniques and tools? Print statements? Formal proofs? Symbolic debuggers?

Norvig: I think it's a mix and it depends on where I am. Sometimes I'm using an IDE that has good tracing capability and sometimes I'm just using Emacs and don't have all that. Certainly tracing and printing. And thinking. Writing smaller test cases and watching them go, and breaking the functionality down to see where the test case failed. And I've got to admit, I often end up rewriting. Sometimes I do that without ever finding the bug. I get to the point where I can just fell that it's in this part here. I'm just not very comfortable about that part. It's a mess. It really shouldn't be that way. Rather than tweak it a little bit at a time, I'll just throw away a couple hundred lines of code, rewrite it from scratch, and often the bug is gone.

Sometimes I feel guilty about that. Is that a failure on my part? I didn't understand what the bug was. I didn't find the bug. I just dropped a bomb on the house and blew up all the bugs and built a new house. In some sense, the bug eluded me. But if it becomes the right solution, maybe it's OK. You've done it faster than you would have by finding it.

This is a really smart way to think about debugging. We don't have infinite time to understand the minute details of some mistake made in the past. What's important is to get things working as quickly as possible. As long as the code has the desired functionality and passes tests appropriately, sometimes rewriting big chunks of code can be an effective debugging method.

Monday, December 6, 2010

Using a Closure to Make Panel Labels in R

When making plots with several panels, I very often want to label the individual panels with letters of numbers. I like to have a simple function for labeling panels, like this:

# create a plot
plot( x, y )
# assign a unique label to the plot
panel_labeler()

Below the break is a definition for the panel_labeler() function using a closure in R. Shown are some examples of using panel_labeler() assign labels to plots. The plots show different methods for combining mixtures of Normal distributions, but this approach could be used for plotting anything.

Wednesday, December 1, 2010

Infrequently Asked Questions about Perl

I just read a few entries from Infrequently Asked Questions about Perl, a hilarious list of joke answers to serious questions by Mark Jason Dominus. Here are some of my favorite entries:

How do I get tomorrow's date?

Use this function:
sub tomorrow_date {
          sleep 86_400; 
          return localtime();
        }

How can I find out whether a number is odd?

sub odd {
            my $number = shift;
            return !even ($number);
        }

How can I find out whether a number is even?

sub even {
            my $number = abs shift;
            return 1 if $number == 0;
            return odd ($number - 1);
        }