Help required with a very specific motif search [View Printable]
|
frasermoss
Group: Admin Posts: 731 Joined: Feb 22, 2005
|
I'm trying to establish if a particular motif at the very C-terminal of any membrane protein (the C-terminal being intracellular) terminates with the residues YKI.
This is obiously a very short sequence and will produce thousands of hits in a search of Swissprot or other protein databases.
Does anyone know how I can set up search parameters to confidently limit my search to membrane proteins, and tell the search engine that I want these residues to be the terminal three residues of the protein? So far without manually trawling through thousands of potential hits I have not found a way to do this in NCBI. A positive control search would be YKV which I know is present in some receptors.
I have also searched PDZbase, but this service is presently limited to only about 300 known interactions.
Any help would be greatly appreciated.
|
......................... "Opportunity is missed by most people because it is dressed in overalls and looks like work". Edison
|
Posted Feb 01, 2007, 19:17 PM |
|
|
|
ryan_m
Group: Moderators Posts: 284 Joined: May 06, 2006
|
So basically you want to find out how many protein sequences terminate in YKI, then test the null hypothesis that these are not over-represented in proteins localized in any one cellular component? The alternative being that these proteins are enriched in membrane-proteins?
Ryan
|
.........................
|
| Posted Feb 01, 2007, 19:23 PM |
|
|
|
bgood
Group: Moderators Posts: 156 Joined: Apr 12, 2006
|
fraser,
First, what is your criteria for accepting that a protein is a membrane protein? Is it sufficient to have annotation to a GO subcellular location of, for example, 'plasma membrane' (GO:0005886)? Do you want to constrain your protein list based on organism?
I don't know, but I suspect that you may need to do a little coding (or get some one else to) in order to answer your question.
You could, for example, retrieve a set of protein sequences from NCBI or other that have the GO annotation you seek, and then quite easily count how many of them end in the motif you are interested in.
|
.........................
|
| Posted Feb 01, 2007, 20:02 PM |
|
|
|
bgood
Group: Moderators Posts: 156 Joined: Apr 12, 2006
|
Anyone know if there is software that enables search of protein databases that includes both semantic constraints (e.g. ontology terms) and regular expression matching (for motif search) ?
??
|
.........................
|
| Posted Feb 01, 2007, 20:05 PM |
|
|
|
|
|
ryan_m
Group: Moderators Posts: 284 Joined: May 06, 2006
|
I can't get the link to the actual tool to work either (e.g. http://sirw.embl.de/)Ryan
|
.........................
|
| Posted Feb 01, 2007, 20:14 PM |
|
|
|
frasermoss
Group: Admin Posts: 731 Joined: Feb 22, 2005
|
Here are my constraints
organism does not really matter although you could eliminate any plant species
Subcellular localization: plasma membrane
Structure: Membrane protein with at least 1 transmembrane spanning domain and an intracellular carboxy terminus.
The carboxy terminus ends with the sequence YKI.
To reiterate- I just want to determine whether or not any mammalian membrane proteins at all terminate with the sequence YKI. On paper it conforms to a PDZ type II interacting motif, but to date I have not yet found an example of it occuring in any known membrane protein whereas the homologous YKV motif does and has been shown to interact with GRIP and PICK1. By the way a negative result - ie it does not exist in nature is a perfectly good result.
However my searching has been limited to manually looking at raw sequences one by one because the shortness of the sequence always overwhelms the search program. I really want to have the computers do the leg work for me.
Any bright more ideas? Thanks for the prelimiary help and the speed of your replies so far.
|
......................... "Opportunity is missed by most people because it is dressed in overalls and looks like work". Edison
|
| Posted Feb 01, 2007, 20:23 PM |
|
|
|
frasermoss
Group: Admin Posts: 731 Joined: Feb 22, 2005
|
hey guys - is this the new link do you reckon?
http://elm.eu.org/index.html
|
......................... "Opportunity is missed by most people because it is dressed in overalls and looks like work". Edison
|
| Posted Feb 01, 2007, 20:43 PM |
|
|
|
ryan_m
Group: Moderators Posts: 284 Joined: May 06, 2006
|
If you have access to a machine running linux or unix with a perl interpreter:
1) download swissprot or any other fasta-formatted protein database of your choice 2)make a file called check_carboxy.pl (see code below) 3) cat fasta_file.fa | check_carboxy.pl 4) all proteins that are printed to the screen are your positives.
Cheers,
Ryan
Code:
#!/usr/bin/perl use strict; my $seq = "YKI"; my $header_info; my $prev_line; while(){ chomp; if(/>(.+)/){ my $new_header_info = $1; if($prev_line =~ /$seq$/i){ print "$header_info\n"; } $header_info = $new_header_info; } else{ $prev_line = $_ if $_; } }
|
.........................
|
| Posted Feb 01, 2007, 20:46 PM |
|
|
|
frasermoss
Group: Admin Posts: 731 Joined: Feb 22, 2005
|
| ryan_m said: | If you have access to a machine running linux or unix with a perl interpreter:
1) download swissprot or any other fasta-formatted protein database of your choice
|
By this you mean perform a search e.g. "plasma membrane" and then down load all 10998 hits as FASTA files? Then apply your code? Please excuse my naivety, but my bioinformatics skills are prety green
|
......................... "Opportunity is missed by most people because it is dressed in overalls and looks like work". Edison
|
| Posted Feb 01, 2007, 21:11 PM |
|
|
|
ryan_m
Group: Moderators Posts: 284 Joined: May 06, 2006
|
I was thinking that you would use the script to find all proteins that match your sequence requirement (i.e. ending in YKI) and then taking those and searching their GO terms to see if any localise to the PM. However the other way would work too. It all depends on if you have access to that. Thinking out loud here, I wonder if you could get this using the BioMart portal to the ENSEMBL database. I'll check it out.
Cheers,
Ryan
|
.........................
|
| Posted Feb 01, 2007, 21:15 PM |
|
|
|
ryan_m
Group: Moderators Posts: 284 Joined: May 06, 2006
|
| frasermoss said: | | ryan_m said: | If you have access to a machine running linux or unix with a perl interpreter:
1) download swissprot or any other fasta-formatted protein database of your choice
|
By this you mean perform a search e.g. "plasma membrane" and then down load all 10998 hits as FASTA files?
Then apply your code?
Please excuse my naivety, but my bioinformatics skills are prety green |
OK. You should be able to use ensembl biomart to get the data you need (e.g. set a filter to get only the proteins with the GO term you want). You can set BioMart to download the sequences as protein sequences in fasta format. Ryan
|
.........................
|
| Posted Feb 01, 2007, 21:20 PM |
|
|
|
frasermoss
Group: Admin Posts: 731 Joined: Feb 22, 2005
|
Job done! Thanks everyone.
|
......................... "Opportunity is missed by most people because it is dressed in overalls and looks like work". Edison
|
| Posted Feb 02, 2007, 2:29 AM |
|
|
|
ryan_m
Group: Moderators Posts: 284 Joined: May 06, 2006
|
Great! Glad to have helped out. I'm interested now. Did you find the result you hoped for?
Ryan
|
.........................
|
| Posted Feb 02, 2007, 5:56 AM |
|
|
|
frasermoss
Group: Admin Posts: 731 Joined: Feb 22, 2005
|
Yep. I did not find any membrane proteins ending in YKI in Human, mouse, rat, dog or C.elegans.
If anyone has the time or the inclination to double check for me that would be cool.
|
......................... "Opportunity is missed by most people because it is dressed in overalls and looks like work". Edison
|
| Posted Feb 02, 2007, 6:52 AM |
|
|
|