Award
 » Home » Bioinformatics » Software Developers » Transcription Factor motif (PERL)
 
Solutions Search! The Customized Life Science Search Engine
Search Site
Search Suppliers
Search Internet
Search over 6000 life science websites specifically selected by our expert scientist moderators.

Other Topics
9/9/2008 12:51 PM
What is the most common b ...
9/2/2008 08:27 PM
how to find a motif, whic ...
8/24/2008 12:16 PM
How to start? (PDB Viewer ...
8/4/2008 11:04 PM
Transcription Factor moti ...
7/9/2008 07:31 PM
What is your programming ...
11/15/2007 03:40 PM
Sequence Search Solution
9/3/2007 08:27 PM
Bioinformatic Software De ...
9/3/2007 08:44 PM
Database Poll
3/26/2007 08:47 PM
Scibuntu (ubuntu linux di ...
2/14/2007 06:23 AM
Bioinformaticians looking ...
8/9/2006 09:39 PM
extremely fast Smith-wate ...
4/20/2006 02:38 PM
Database Poll, What DBMS ...
4/19/2006 01:19 PM
Inference engine for comp ...
2/22/2006 10:56 PM
about the coupling of GPC ...
12/7/2005 01:37 PM
Software Development
6/21/2005 03:27 PM
Error in using SeqIOTools
6/14/2005 04:13 PM
PERL
1/13/2005 10:32 PM
HT-GO-FAT
Subscribet to topic
bottom of page RSS Feed Topic Feed
 Transcription Factor motif (PERL) [View Printable]
ABC

Frog Egg

[ Privacy ]
See
Similar
Scientists





Group: Member
Posts: 11
Joined: Mar 13, 2007







 Send a personal messsage to ABC Reply with a quote from this post Go to the top of the page

I want to use perl to find transcription factor DNA binding sites. For instance CACTTGAN. I only have basic perl writing comprehension but can follow a script fairly well. Thanks. ABC
.........................
[ Privacy ]

 Posted Mar 14, 2007, 0:50 AM
ryan_m

Frog Laureate

See
Similar
Scientists





Group: Moderators
Posts: 274
Joined: May 06, 2006







 Go to homepage of ryan_m Send a personal messsage to ryan_m Reply with a quote from this post Go to the top of the page

This would be a rather crude way to search for TFBS's. There are many 'scanner' tools available that use PWWMs or PSSMs to find putative sites (and score them). I have used TRANSFAC's "match" tool as well as "patser" with some success. If you can get ahold of a PSSM for your transcription factor I would suggest scanners rather than strict regexes. Failing that, you could probably do what you need in a few lines of perl. Try what I have put below (keep in mind I haven't tested it).

#input is a fasta file of upstream sequences, you can get 1kb and 5kb upstream sequences pre-extracted from the UCSC genome browser download page

#!/usr/bin/perl
use strict;
use Bio::SeqIO;

my $pattern = "CACTTGA[ACTG]";

my $io = Bio::SeqIO->new(-file=>"upstream_1kb.fa",-format=>'fasta');

while(my $seq_obj = $io->next_seq){
my $full_sequence = uc($seq_obj->seq);
if($full_sequence =~ /$pattern/ ){
#this sequence matches, do something with it
my $id = $seq_obj->display_id;
print "$id has match on (+) strand\n";
}
my $rc = $seq_obj->revcom;
my $rc_seq = uc($rc->seq);
if($rc_seq =~ /$pattern/ ){
#this sequence matches, do something with it
my $id = $seq_obj->display_id;
print "$id has match on (-) strand\n";
}
}
.........................

Posted Mar 13, 2007, 22:13 PM
surferchic

Frog Egg

See
Similar
Scientists





Group: Member
Posts: 11
Joined: Dec 12, 2006







 Send a personal messsage to surferchic Reply with a quote from this post Go to the top of the page

Ya..

Why do you want to use a script? Are you after TFBS that aren't in the databases already?

e.g. Transfac or Oreganno.

-SC
.........................

Posted Mar 13, 2007, 22:17 PM
ABC

Frog Egg

[ Privacy ]
See
Similar
Scientists





Group: Member
Posts: 11
Joined: Mar 13, 2007







 Send a personal messsage to ABC Reply with a quote from this post Go to the top of the page

I'm not partial to Perl at all....there are easier ways I know. I am learning to find known TF binding sites first so I can proceed to design scripts to find unknown TF's. So ultimately I'd like to use perl to search both strands (forward and reverse comlement) to do this. Thanks.

ABC
.........................
[ Privacy ]

Posted Mar 14, 2007, 2:39 AM
ABC

Frog Egg

[ Privacy ]
See
Similar
Scientists





Group: Member
Posts: 11
Joined: Mar 13, 2007







 Send a personal messsage to ABC Reply with a quote from this post Go to the top of the page

Are you familiar with, the $& function, I was told this may help?
.........................
[ Privacy ]

Posted Mar 14, 2007, 2:49 AM
ryan_m

Frog Laureate

See
Similar
Scientists





Group: Moderators
Posts: 274
Joined: May 06, 2006







 Go to homepage of ryan_m Send a personal messsage to ryan_m Reply with a quote from this post Go to the top of the page

ABC said:
Are you familiar with, the $& function, I was told this may help?


After applying your regex, $& stores the portion of the sequence that matched. So in the code I supplied you with, you could store all the $& in an array if you want to know the real sequence of the matching sites. $` and $' give you the left and right flanking sequences as well (in other words $` . $& . $` is your original sequence). However, considering what you say about discovering novel sites, I think you are looking at this in an overly-simplistic way. How does knowing how to match known regular expressions lead to a way to find novel ones? The identification of novel sites generally uses some sort of alignment of the promoter sites of co-regulated genes (using Gibbs sampling, for example).

Ryan
.........................

Posted Mar 14, 2007, 2:55 AM
ABC

Frog Egg

[ Privacy ]
See
Similar
Scientists





Group: Member
Posts: 11
Joined: Mar 13, 2007







 Send a personal messsage to ABC Reply with a quote from this post Go to the top of the page

I'll be using my output to compare results obtained through Patser and MEME for example. Then seeing which program spit out sequences which actually are known to bind TF's. I will be using Gibbs etc for the novel sites, I tried to simply what it is I need (perl script) now. Thanks for the interest.
.........................
[ Privacy ]

Posted Mar 14, 2007, 18:02 PM
top of page

Forum Jump