Scientist Solutions: Life Science Discussions
 Refer a Friend    Link To Us    Bookmark Us       

      
 » Home » Bioinformatics » Miscellaneous Software » Help required with a very specific motif search

Other Topics
11/18/2008 09:31 AM
Linux vs SGI
11/3/2008 02:23 PM
Bio-IT Webinar Wednesday, ...
10/8/2008 06:21 PM
New Biology Search Engine
9/30/2008 08:17 PM
vector nti
8/18/2008 07:48 PM
Computational Biology Res ...
8/12/2008 02:30 PM
Most influential bioinfor ...
8/4/2008 11:56 PM
Influential bioinformatic ...
6/23/2008 12:11 AM
Announcement of a new web ...
6/20/2008 08:48 AM
looking for Freezer inven ...
5/15/2008 05:27 AM
Mac vs PC
5/14/2008 06:13 PM
Importing a PDF into a Wo ...
4/13/2008 03:19 PM
Gene ontology help
1/31/2008 06:12 PM
Cytoscape open source bio ...
1/14/2008 09:43 PM
NAR Database Issue
12/5/2007 07:23 PM
Free gene browser
11/25/2007 06:48 PM
Using of SSEARCH, ScanPS, ...
11/15/2007 05:30 AM
TaqMan probe design
11/13/2007 07:28 AM
Web-based phylogenetics t ...
11/9/2007 07:29 AM
ask for Fluxus' DNA Align ...
11/5/2007 04:49 PM
Ask for your help about t ...
10/31/2007 07:06 PM
Favorite tool for motif d ...
9/3/2007 05:04 AM
anyone have tried Geneiou ...
8/14/2007 04:35 PM
BioMoby Help
7/9/2007 05:24 PM
RPro Statistical Software
5/20/2007 07:13 PM
NOC 3.0 is released
5/3/2007 09:23 AM
Pymol: HighResolution Ray ...
3/28/2007 03:00 AM
Another bioinformatics on ...
3/1/2007 09:26 PM
Pathway database (integra ...
2/23/2007 07:09 PM
wikiomics - a wiki forum ...
1/31/2007 07:05 PM
Premier Biosoft Releases ...
Subscribet to topic
Add Reply  Add New Topic  Add New Poll
bottom of page RSS Feed 

Topic Feed

 

Help required with a very specific motif search

 [View Printable]
frasermoss

Frog Laureate

See
Similar
Scientists



View Blogs


Group: Admin
Posts: 731
Joined: Feb 22, 2005







 Send a personal messsage to frasermoss Reply with a quote from this post Go to the top of the page

I'm trying to establish if a particular motif at the very C-terminal of any membrane protein (the C-terminal being intracellular) terminates with the residues YKI.

This is obiously a very short sequence and will produce thousands of hits in a search of Swissprot or other protein databases.

Does anyone know how I can set up search parameters to confidently limit my search to membrane proteins, and tell the search engine that I want these residues to be the terminal three residues of the protein? So far without manually trawling through thousands of potential hits I have not found a way to do this in NCBI. A positive control search would be YKV which I know is present in some receptors.

I have also searched PDZbase, but this service is presently limited to only about 300 known interactions.

Any help would be greatly appreciated.

.........................
"Opportunity is missed by most people because it is dressed in overalls and looks like work". Edison

 Posted Feb 01, 2007, 19:17 PM
ryan_m

Frog Laureate

See
Similar
Scientists





Group: Moderators
Posts: 284
Joined: May 06, 2006







 Go to homepage of ryan_m Send a personal messsage to ryan_m Reply with a quote from this post Go to the top of the page

So basically you want to find out how many protein sequences terminate in YKI, then test the null hypothesis that these are not over-represented in proteins localized in any one cellular component? The alternative being that these proteins are enriched in membrane-proteins?

Ryan

.........................

Posted Feb 01, 2007, 19:23 PM
bgood

Frog Laureate

See
Similar
Scientists





Group: Moderators
Posts: 156
Joined: Apr 12, 2006







 Go to homepage of bgood Send a personal messsage to bgood Reply with a quote from this post Go to the top of the page

fraser,

First, what is your criteria for accepting that a protein is a membrane protein? Is it sufficient to have annotation to a GO subcellular location of, for example, 'plasma membrane' (GO:0005886)? Do you want to constrain your protein list based on organism?

I don't know, but I suspect that you may need to do a little coding (or get some one else to) in order to answer your question.

You could, for example, retrieve a set of protein sequences from NCBI or other that have the GO annotation you seek, and then quite easily count how many of them end in the motif you are interested in.

.........................

Posted Feb 01, 2007, 20:02 PM
bgood

Frog Laureate

See
Similar
Scientists





Group: Moderators
Posts: 156
Joined: Apr 12, 2006







 Go to homepage of bgood Send a personal messsage to bgood Reply with a quote from this post Go to the top of the page

Anyone know if there is software that enables search of protein databases that includes both semantic constraints (e.g. ontology terms) and regular expression matching (for motif search) ?

??

.........................

Posted Feb 01, 2007, 20:05 PM
surferchic

Frog Egg

See
Similar
Scientists





Group: Member
Posts: 11
Joined: Dec 12, 2006







 Send a personal messsage to surferchic Reply with a quote from this post Go to the top of the page

You might have a look here - though I had some connection issues so couldn't try it out

http://www.embl-heidelberg.de/~chenna/elm_2.html

.........................

Posted Feb 01, 2007, 20:10 PM
ryan_m

Frog Laureate

See
Similar
Scientists





Group: Moderators
Posts: 284
Joined: May 06, 2006







 Go to homepage of ryan_m Send a personal messsage to ryan_m Reply with a quote from this post Go to the top of the page

surferchic said:
You might have a look here - though I had some connection issues so couldn't try it out

http://www.embl-heidelberg.de/~chenna/elm_2.html


I can't get the link to the actual tool to work either (e.g. http://sirw.embl.de/)

Ryan

.........................

Posted Feb 01, 2007, 20:14 PM
frasermoss

Frog Laureate

See
Similar
Scientists



View Blogs


Group: Admin
Posts: 731
Joined: Feb 22, 2005







 Send a personal messsage to frasermoss Reply with a quote from this post Go to the top of the page

Here are my constraints

organism does not really matter although you could eliminate any plant species

Subcellular localization: plasma membrane

Structure: Membrane protein with at least 1 transmembrane spanning domain and an intracellular carboxy terminus.

The carboxy terminus ends with the sequence YKI.

To reiterate- I just want to determine whether or not any mammalian membrane proteins at all terminate with the sequence YKI. On paper it conforms to a PDZ type II interacting motif, but to date I have not yet found an example of it occuring in any known membrane protein whereas the homologous YKV motif does and has been shown to interact with GRIP and PICK1. By the way a negative result - ie it does not exist in nature is a perfectly good result.

However my searching has been limited to manually looking at raw sequences one by one because the shortness of the sequence always overwhelms the search program. I really want to have the computers do the leg work for me.

Any bright more ideas? Thanks for the prelimiary help and the speed of your replies so far.

.........................
"Opportunity is missed by most people because it is dressed in overalls and looks like work". Edison

Posted Feb 01, 2007, 20:23 PM
frasermoss

Frog Laureate

See
Similar
Scientists



View Blogs


Group: Admin
Posts: 731
Joined: Feb 22, 2005







 Send a personal messsage to frasermoss Reply with a quote from this post Go to the top of the page

hey guys - is this the new link do you reckon?

http://elm.eu.org/index.html

.........................
"Opportunity is missed by most people because it is dressed in overalls and looks like work". Edison

Posted Feb 01, 2007, 20:43 PM
ryan_m

Frog Laureate

See
Similar
Scientists





Group: Moderators
Posts: 284
Joined: May 06, 2006







 Go to homepage of ryan_m Send a personal messsage to ryan_m Reply with a quote from this post Go to the top of the page

If you have access to a machine running linux or unix with a perl interpreter:

1) download swissprot or any other fasta-formatted protein database of your choice
2)make a file called check_carboxy.pl (see code below)
3) cat fasta_file.fa | check_carboxy.pl
4) all proteins that are printed to the screen are your positives.

Cheers,

Ryan

Code:

#!/usr/bin/perl
use strict;
my $seq = "YKI";
my $header_info;
my $prev_line;
while(){
chomp;
if(/>(.+)/){
my $new_header_info = $1;
if($prev_line =~ /$seq$/i){
print "$header_info\n";
}
$header_info = $new_header_info;
}
else{
$prev_line = $_ if $_;
}
}

.........................

Posted Feb 01, 2007, 20:46 PM
frasermoss

Frog Laureate

See
Similar
Scientists



View Blogs


Group: Admin
Posts: 731
Joined: Feb 22, 2005







 Send a personal messsage to frasermoss Reply with a quote from this post Go to the top of the page

ryan_m said:
If you have access to a machine running linux or unix with a perl interpreter:

1) download swissprot or any other fasta-formatted protein database of your choice



By this you mean perform a search e.g. "plasma membrane" and then down load all 10998 hits as FASTA files?

Then apply your code?

Please excuse my naivety, but my bioinformatics skills are prety green

.........................
"Opportunity is missed by most people because it is dressed in overalls and looks like work". Edison

Posted Feb 01, 2007, 21:11 PM
ryan_m

Frog Laureate

See
Similar
Scientists





Group: Moderators
Posts: 284
Joined: May 06, 2006







 Go to homepage of ryan_m Send a personal messsage to ryan_m Reply with a quote from this post Go to the top of the page

I was thinking that you would use the script to find all proteins that match your sequence requirement (i.e. ending in YKI) and then taking those and searching their GO terms to see if any localise to the PM. However the other way would work too. It all depends on if you have access to that. Thinking out loud here, I wonder if you could get this using the BioMart portal to the ENSEMBL database. I'll check it out.

Cheers,

Ryan

.........................

Posted Feb 01, 2007, 21:15 PM
ryan_m

Frog Laureate

See
Similar
Scientists





Group: Moderators
Posts: 284
Joined: May 06, 2006







 Go to homepage of ryan_m Send a personal messsage to ryan_m Reply with a quote from this post Go to the top of the page

frasermoss said:
ryan_m said:
If you have access to a machine running linux or unix with a perl interpreter:

1) download swissprot or any other fasta-formatted protein database of your choice



By this you mean perform a search e.g. "plasma membrane" and then down load all 10998 hits as FASTA files?

Then apply your code?

Please excuse my naivety, but my bioinformatics skills are prety green


OK. You should be able to use ensembl biomart to get the data you need (e.g. set a filter to get only the proteins with the GO term you want). You can set BioMart to download the sequences as protein sequences in fasta format.

Ryan

.........................

Posted Feb 01, 2007, 21:20 PM
frasermoss

Frog Laureate

See
Similar
Scientists



View Blogs


Group: Admin
Posts: 731
Joined: Feb 22, 2005







 Send a personal messsage to frasermoss Reply with a quote from this post Go to the top of the page

Job done! Thanks everyone.

.........................
"Opportunity is missed by most people because it is dressed in overalls and looks like work". Edison

Posted Feb 02, 2007, 2:29 AM
ryan_m

Frog Laureate

See
Similar
Scientists





Group: Moderators
Posts: 284
Joined: May 06, 2006







 Go to homepage of ryan_m Send a personal messsage to ryan_m Reply with a quote from this post Go to the top of the page

Great!
Glad to have helped out. I'm interested now. Did you find the result you hoped for?

Ryan

.........................

Posted Feb 02, 2007, 5:56 AM
frasermoss

Frog Laureate

See
Similar
Scientists



View Blogs


Group: Admin
Posts: 731
Joined: Feb 22, 2005







 Send a personal messsage to frasermoss Reply with a quote from this post Go to the top of the page

Yep. I did not find any membrane proteins ending in YKI in Human, mouse, rat, dog or C.elegans.

If anyone has the time or the inclination to double check for me that would be cool.

.........................
"Opportunity is missed by most people because it is dressed in overalls and looks like work". Edison

Posted Feb 02, 2007, 6:52 AM
Current Page:1   << Last Page 1 2  Next Page >>
Total Pages: 2
top of page Add Reply  Add New Topic  Add New Poll

Forum Jump