Coding Variation: 2011

Tuesday, August 23, 2011

Combinations of weak predictors can be very effective.

From Boosting Algorithms: Regularization, Prediction and Model Fitting:

Kearns and Valiant [52] proved that if individual classifiers perform at least slightly better than guessing at random, their predictions can be combined and averaged, yielding much better predictions.

The reference is to Cryptographic limitations on learning Boolean formulae and finite automata.

Tuesday, March 22, 2011

Scientific Criticism

Michael Eisen on a recent talk by Felisa Wolfe-Simon:

The acid test of a scientist is how they respond when their work is criticized. The best scientists listen and consider what is being said, defend the things they still believe and, most importantly, recognize where their work fell short and use criticism to make their work better. This is, of course, not always so simple. It’s easy to get defensive instead – to view criticism as an attack, see sinister motives in its sources, and ignore its substance.
But I think the worst response is to view criticism as a kind of virtue. And there were signs in Wolfe-Simon’s talk that she is beginning to relish the role of the iconoclast. She appears to see herself as someone who has unconventional ideas that the scientific community can’t deal with. And that criticism of her work is not an effort to get at the truth but a conspiracy to suppress it. At several points she made reference to other scientists whose ideas were not accepted when they were proposed, but which turned out in the long run to be correct.

People laughed at Galileo. They also laughed at Groucho Marx, though in a different manner. Worst of all is being laughed at like so many laugh at Sarah Palin, as Galileo turned out to be right, and I always got the impression that Groucho was laughing too.

Link: Felisa Wolfe-Simon (of arsenic infamy) is no more convincing in person than in print

Wednesday, March 16, 2011

Where have all the bullies gone?

The Disappointing Taste of Revenge, a blog entry by Ta-Nehisi Coates:

It's sort of the same thing here. This kid--who shouldn't have put his hands on anyone--gets power-slammed on a concrete driveway, is stumbling out of the frame, and for all we know could be concussed, and you read the comments, and everyone's yelling "Damn right." This is a world filled with people who've been bullied--but no people who are, or ever were, actual bullies.

I'm certain that the real bullies are hanging out with Carl Sagan's fire-breathing dragon, or are orbiting the sun near Bertrand Russell's teapot.

Tuesday, March 15, 2011

Louise Glass on the cheapness of data and the importance of being focused

Louise Glass from Berkeley has just received a fellowship that will allow her to develop new projects relating to bioenergy. In a recent interview with Nature, she talks about the changes that she's experienced during her career in science:

What has been the biggest change in science during your career?

The pace. When I was a graduate student, a postdoc across the hall from me sequenced one kilobyte of DNA. We have just finished sequencing the 40-megabyte genome of 100 wild Neurospora isolates. In this day and age, it is so easy to get data. The advantage is being able to ask very elegant questions because you are not limited by data. But it is also easy to lose sight of the biological problem you are trying to address. That is the danger.

Sunday, February 6, 2011

Difficulties in measuring the value of science

From "The mismeasurement of science," by Michael Nielsen:

In this essay I argue that heavy reliance on a small number of metrics is bad for science. Of course, many people have previously criticised metrics such as citation count or the h-index. Such criticisms tend to fall into one of two categories. In the first category are criticisms of the properties of particular metrics, for example, that they undervalue pioneer work, or that they unfairly disadvantage particular fields. In the second category are criticisms of the entire notion of quantitatively measuring science. My argument differs from both these types of arguments. I accept that metrics in some form are inevitable – after all, as I said above, every granting or hiring committee is effectively using a metric every time they make a decision. My argument instead is essentially an argument against homogeneity in the evaluation of science: it’s not the use of metrics I’m objecting to, per se, rather it’s the idea that a relatively small number of metrics may become broadly influential. I shall argue that it’s much better if the system is very diverse, with all sorts of different ways being used to evaluate science. Crucially, my argument is independent of the details of what metrics are being broadly adopted: no matter how well-designed a particular metric may be, we shall see that it would be better to use a more heterogeneous system.

Neilsen's argument is that measuring the value of science is very difficult, but we must make a judgment on the value of science (or the expected return on investment in a scientific project) whenever we allocate funds in science. His solution to this problem is to maintain multiple metrics by which we judge the value of science rather than using a single metric.

Sunday, January 16, 2011

Script of the Week: sync.pl

I've started experimenting with gist for storing and sharing snippets of code online. Below is a summary of a script that I use to synchronize directories across different machines using rsync.

Forensic Bioinformatics and Clinical Trials

From a Nature News article titled Cancer trial errors revealed:

Now, in response to information obtained by Nature under the US Freedom of Information Act, Kornbluth and Cuffe have offered their account of the mistakes that led the trials to be restarted even after they learned of potential flaws in the underlying data. The affair will have an impact beyond Duke, as the Institute of Medicine, part of the US National Academies in Washington DC, begins to examine research on genome-based patient testing. Originally commissioned to investigate Duke's controversial trials, the institute's US$687,000 study is now expected to focus on providing broader recommendations for the design of clinical trials that similarly use genomic data from individual patients to tailor therapy.

This seems like a big mistake on the part of the panel.

Related:

Make Up Your Own Rules of Probability

Irreproducible Analysis