Scientist Solutions: Life Science Discussions
 Refer a Friend    Link To Us    Bookmark Us       

      
 » Home » Bioinformatics » Systems Biology » ortholog detection by reciprocal BLAST

Other Topics
11/20/2008 05:26 PM
Which BLAST do you most f ...
11/19/2008 05:40 AM
Which math/statistics lan ...
11/19/2008 02:25 AM
PhD research position
11/10/2008 01:31 PM
Help Regarding Protein Ov ...
11/9/2008 11:49 AM
Help Regarding Protein Ov ...
11/9/2008 11:21 AM
Help Regarding Protein Ov ...
10/22/2008 04:05 PM
Systems Biology Webinar ...
4/4/2007 04:47 PM
R for Microarray Experime ...
3/1/2007 09:13 PM
BioPAX - biological pathw ...
12/12/2006 09:05 PM
Software for Sys Bio
10/23/2006 03:18 PM
Systems Biology Using SBM ...
5/7/2006 07:26 PM
Using Mathcad
4/19/2006 01:19 PM
Inference engine for comp ...
7/7/2005 02:00 AM
Metabolomics
6/20/2005 10:58 AM
Docking Studies in Phosph ...
6/17/2005 01:05 AM
Recent Docking Studies in ...
Subscribet to topic
Add Reply  Add New Topic  Add New Poll
bottom of page RSS Feed 

Topic Feed

 

ortholog detection by reciprocal BLAST

 [View Printable]
miiruu

Frog Egg

See
Similar
Scientists





Group: Member
Posts: 1
Joined: Jun 06, 2006







 Send a personal messsage to miiruu Reply with a quote from this post Go to the top of the page

I have to detect orthologs by using reciprocal BLAST, i.e. blasting the set A against the set B and then blasting the set B against the set A.

If the sets A and B contain redundant entries, how should I deal with the redundancies in both sets (query sets and subject sets)? I was thinking about grouping the genes with the top scores into one. Someone told me that grouping them by scores won't work since it depends on the sequence length. Another idea was to make the non-redundant sets for A and B, which contains the representatives from a cluster of similar sequences within each set. I don't want to go down this road since taking a representatives means a loss of information.

Can you give me an advice on this? If there is any other thing that I should watch out for when doing the reciprocal blast, please let me know. I'd appreciate it very much.

.........................

 Posted Jun 06, 2006, 21:03 PM
ryan_m

Frog Laureate

See
Similar
Scientists





Group: Moderators
Posts: 279
Joined: May 06, 2006







 Go to homepage of ryan_m Send a personal messsage to ryan_m Reply with a quote from this post Go to the top of the page

This is more of a Genomics-related question, so I will do my best to answer it. I think you should be able to identify your redundant sequences by plotting a histogram of the blastn percent identity (not score) between ESTs in a self-blast (all ESTs blasted against the same set of ESTs). You should see a bimodal disribution, with redundant ESTs having a percent identity around 98-100% (depending on if they are quality-trimmed etc). You should see a second, broader, peak representing paralogs or gene family members. You should be able to set a threshold for identifying redundant ESTs based on the spread of the first peak. At that point, you would want to choose one EST from each cluster to represent your gene. In the case of UniGene, I believe they use the longest one (which is usually a full-length cDNA sequence).

.........................

Posted Jun 06, 2006, 23:25 PM
ryan_m

Frog Laureate

See
Similar
Scientists





Group: Moderators
Posts: 279
Joined: May 06, 2006







 Go to homepage of ryan_m Send a personal messsage to ryan_m Reply with a quote from this post Go to the top of the page

miiruu said:

If there is any other thing that I should watch out for when doing the reciprocal blast, please let me know. I'd appreciate it very much.


Also, keep in mind that your EST sets most likely do not represent the entire gene set of your organism. As a result, the best hit between A and B, even if it is in reciprocal, may not be a true ortholog pair. This could be because the true ortholog in species B was not in the EST library, so its paralog matched A in reciprocal instead.
If you have a genome for one of the organisms (A and B), you can do some tricky stuff to compensate for this. Check out this paper if this is the case:

Genome Res. Morin et al. 16: 796

Good Luck!

.........................

Posted Jun 06, 2006, 23:29 PM
ryan_m

Frog Laureate

See
Similar
Scientists





Group: Moderators
Posts: 279
Joined: May 06, 2006







 Go to homepage of ryan_m Send a personal messsage to ryan_m Reply with a quote from this post Go to the top of the page

If you find RBH is problematic in your case, you may want to compare its performance to that of the RSD method (reciprocal smallest distance), which is supposed to be more robust than RBH.

see documentation

.........................

Posted Jun 12, 2006, 15:39 PM
LennyB

Frog Egg

See
Similar
Scientists





Group: Member
Posts: 1
Joined: Oct 16, 2006







 Send a personal messsage to LennyB Reply with a quote from this post Go to the top of the page

- NRDB is a free software product provided by NCBI to eliminate redundancy in sequence sets without losing information.
- SLIM Search (a commercial product) can run the searches for a full genome pair-wise comparison in a few minutes on a typical PC (instead of months on a super computer), so you have more opportunity to find parameters that provide the best biological results.
- There are issues with using Blast percent identities and scores as these relate only to the largest local alignment of the sequence pair. As a result, you can get a short fragment with 100% identity over 100nt out of 1,000 that shows up as a better hit than 80% identity distributed over 1,000nt. This is particularly problematic with transcription factors that show high conservation in short domains with high divergence between.

.........................

Posted Oct 16, 2006, 8:32 AM
sichan

Frog Laureate

See
Similar
Scientists





Group: Moderators
Posts: 27
Joined: Jul 30, 2008







 Send a personal messsage to sichan Reply with a quote from this post Go to the top of the page

Also, it would be useful to differentiate between out-paralogs and in-paralogs. Out-paralogs are paralogs that formed before the speciation event, while in-paralogs are those that formed after the speciation event. In-paralogs are true orthologs.

Inparanoid is a BLAST based tool that attempts to differentiate between the two categories:

Remm M, Storm CE, Sonnhammer EL. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. 2001. J Mol Biol 314:1041-1052.

Online tool located here: http://inparanoid.sbc.su.se/cgi-bin/index.cgi

.........................

Posted Aug 04, 2008, 22:50 PM
top of page Add Reply  Add New Topic  Add New Poll

Forum Jump