|
11/20/2008 05:26 PM
|
|
11/19/2008 05:40 AM
|
|
11/19/2008 02:25 AM
|
|
11/10/2008 01:31 PM
|
|
11/9/2008 11:49 AM
|
|
11/9/2008 11:21 AM
|
|
10/22/2008 04:05 PM
|
|
4/4/2007 04:47 PM
|
|
3/1/2007 09:13 PM
|
|
12/12/2006 09:05 PM
|
|
10/23/2006 03:18 PM
|
|
5/7/2006 07:26 PM
|
|
4/19/2006 01:19 PM
|
|
7/7/2005 02:00 AM
|
|
6/20/2005 10:58 AM
|
|
6/17/2005 01:05 AM
|
|
ortholog detection by reciprocal BLAST [View Printable]
|
miiruu
Group: Member Posts: 1 Joined: Jun 06, 2006
|
I have to detect orthologs by using reciprocal BLAST, i.e. blasting the set A against the set B and then blasting the set B against the set A.
If the sets A and B contain redundant entries, how should I deal with the redundancies in both sets (query sets and subject sets)? I was thinking about grouping the genes with the top scores into one. Someone told me that grouping them by scores won't work since it depends on the sequence length. Another idea was to make the non-redundant sets for A and B, which contains the representatives from a cluster of similar sequences within each set. I don't want to go down this road since taking a representatives means a loss of information.
Can you give me an advice on this? If there is any other thing that I should watch out for when doing the reciprocal blast, please let me know. I'd appreciate it very much.
|
.........................
|
Posted Jun 06, 2006, 21:03 PM |
|
|
|
ryan_m
Group: Moderators Posts: 279 Joined: May 06, 2006
|
This is more of a Genomics-related question, so I will do my best to answer it. I think you should be able to identify your redundant sequences by plotting a histogram of the blastn percent identity (not score) between ESTs in a self-blast (all ESTs blasted against the same set of ESTs). You should see a bimodal disribution, with redundant ESTs having a percent identity around 98-100% (depending on if they are quality-trimmed etc). You should see a second, broader, peak representing paralogs or gene family members. You should be able to set a threshold for identifying redundant ESTs based on the spread of the first peak. At that point, you would want to choose one EST from each cluster to represent your gene. In the case of UniGene, I believe they use the longest one (which is usually a full-length cDNA sequence).
|
.........................
|
| Posted Jun 06, 2006, 23:25 PM |
|
|
|
ryan_m
Group: Moderators Posts: 279 Joined: May 06, 2006
|
| miiruu said: | If there is any other thing that I should watch out for when doing the reciprocal blast, please let me know. I'd appreciate it very much. |
Also, keep in mind that your EST sets most likely do not represent the entire gene set of your organism. As a result, the best hit between A and B, even if it is in reciprocal, may not be a true ortholog pair. This could be because the true ortholog in species B was not in the EST library, so its paralog matched A in reciprocal instead. If you have a genome for one of the organisms (A and B), you can do some tricky stuff to compensate for this. Check out this paper if this is the case: Genome Res. Morin et al. 16: 796 Good Luck!
|
.........................
|
| Posted Jun 06, 2006, 23:29 PM |
|
|
|
ryan_m
Group: Moderators Posts: 279 Joined: May 06, 2006
|
If you find RBH is problematic in your case, you may want to compare its performance to that of the RSD method (reciprocal smallest distance), which is supposed to be more robust than RBH.
see documentation
|
.........................
|
| Posted Jun 12, 2006, 15:39 PM |
|
|
|
LennyB
Group: Member Posts: 1 Joined: Oct 16, 2006
|
- NRDB is a free software product provided by NCBI to eliminate redundancy in sequence sets without losing information. - SLIM Search (a commercial product) can run the searches for a full genome pair-wise comparison in a few minutes on a typical PC (instead of months on a super computer), so you have more opportunity to find parameters that provide the best biological results. - There are issues with using Blast percent identities and scores as these relate only to the largest local alignment of the sequence pair. As a result, you can get a short fragment with 100% identity over 100nt out of 1,000 that shows up as a better hit than 80% identity distributed over 1,000nt. This is particularly problematic with transcription factors that show high conservation in short domains with high divergence between.
|
.........................
|
| Posted Oct 16, 2006, 8:32 AM |
|
|
|
sichan
Group: Moderators Posts: 27 Joined: Jul 30, 2008
|
Also, it would be useful to differentiate between out-paralogs and in-paralogs. Out-paralogs are paralogs that formed before the speciation event, while in-paralogs are those that formed after the speciation event. In-paralogs are true orthologs.
Inparanoid is a BLAST based tool that attempts to differentiate between the two categories:
Remm M, Storm CE, Sonnhammer EL. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. 2001. J Mol Biol 314:1041-1052.
Online tool located here: http://inparanoid.sbc.su.se/cgi-bin/index.cgi
|
.........................
|
| Posted Aug 04, 2008, 22:50 PM |
|
|
|
|
top of page
|
|
Forum Jump
|
|