Bootstrapping Bioinformatics

Sanjib_Guha · April 4, 2010, 10:35pm

Hey guys,

anybody has any idea of bootstrapping?
Like what it is?why is it useful? and how we do it?

Any input will be highly appreciated…

luvkashyap · April 5, 2010, 8:48am

I guess to begin, you should go through this article "The Bootstrapping Based Recognition of Conceptual Relationship for Text Retrieval "
Link : http://www.springerlink.com/content/k7r7452j8768k01q/
and this File to get a idea http://www.clcbio.com/sciencearticles/BE-phylogenetics.pdf

In a nut shell, Bootstrapping is a measure of accuracy- a biologically more relevant parameter that gives the probability that the true phylogeny has been recovered. As with my understanding, bootstrap numbers indicate how similar the sequences are. For example, if A and B are in the closet node, the bootstrap number 900 for this node would mean that out of 1000 calculations, 900 times, A and B would appear in the closet node.

Hope this helps,

Luv

kbradnam · April 5, 2010, 1:57pm

Bootstrap values can be attached to branches of a phylogenetic tree. Generally, very short branches are less reliable and less informative than very long branches. E,g, consider the following hypothetical tree:

`
----------------| A
-------------|
| -------------| B
–|
| ----| C
—|
|
| ------| D
–|
--------| E

`

In most cases we would be more confident that A & B should be grouped together than C & D & E. The short branch that separates C from DE group may not be real; it may be an artifact of the phylogenetic methods used (distance measures, alignment parameters etc.). Maybe a different program would group CD together with E separate from them, or maybe C would come out grouped with AB instead.

So the question is how reliable are those branches and bootstrapping attempts to answer that. Bootstrapping effectively resamples the underlying sequence alignment N times, and the final value (for each branch) is either the absolute number of times (out of N replicates) or the percentage of times that the grouping to the right of the branch was the same.

As I understand it, basic bootstrapping just randomly samples positions in each sequence alignment to form a new alignment. But any position can potentially be sampled more than once. If the phylogeny is robust, these occasional resamplings of the same position should not affect the resulting tree. But if the tree is not strong, individual branches may move or disappear in some of the shuffled alignments, suggesting that the branch isn’t real.

Basic bootstrapping methods do not give you any statistical significance to the branches. You should still use common sense. It is not easy to say that a tree with a 75% bootstrap value on a branch is any better than one with a 74% value.

Sanjib_Guha · April 6, 2010, 2:56am

Thanks a lot guys.
I really appreciate your help…