|
|
|
Transcription Factor motif (PERL) [View Printable]
|
ABC
Group: Member Posts: 11 Joined: Mar 13, 2007
|
I want to use perl to find transcription factor DNA binding sites. For instance CACTTGAN. I only have basic perl writing comprehension but can follow a script fairly well. Thanks.
ABC
|
......................... [ Privacy ]
|
Posted Mar 14, 2007, 0:50 AM |
|
|
|
ryan_m
Group: Moderators Posts: 274 Joined: May 06, 2006
|
This would be a rather crude way to search for TFBS's. There are many 'scanner' tools available that use PWWMs or PSSMs to find putative sites (and score them). I have used TRANSFAC's "match" tool as well as "patser" with some success. If you can get ahold of a PSSM for your transcription factor I would suggest scanners rather than strict regexes. Failing that, you could probably do what you need in a few lines of perl. Try what I have put below (keep in mind I haven't tested it).
#input is a fasta file of upstream sequences, you can get 1kb and 5kb upstream sequences pre-extracted from the UCSC genome browser download page
#!/usr/bin/perl use strict; use Bio::SeqIO;
my $pattern = "CACTTGA[ACTG]";
my $io = Bio::SeqIO->new(-file=>"upstream_1kb.fa",-format=>'fasta');
while(my $seq_obj = $io->next_seq){ my $full_sequence = uc($seq_obj->seq); if($full_sequence =~ /$pattern/ ){ #this sequence matches, do something with it my $id = $seq_obj->display_id; print "$id has match on (+) strand\n"; } my $rc = $seq_obj->revcom; my $rc_seq = uc($rc->seq); if($rc_seq =~ /$pattern/ ){ #this sequence matches, do something with it my $id = $seq_obj->display_id; print "$id has match on (-) strand\n"; } }
|
.........................
|
| Posted Mar 13, 2007, 22:13 PM |
|
|
|
surferchic
Group: Member Posts: 11 Joined: Dec 12, 2006
|
Ya.. Why do you want to use a script? Are you after TFBS that aren't in the databases already? e.g. Transfac or Oreganno. -SC
|
.........................
|
| Posted Mar 13, 2007, 22:17 PM |
|
|
|
ABC
Group: Member Posts: 11 Joined: Mar 13, 2007
|
I'm not partial to Perl at all....there are easier ways I know. I am learning to find known TF binding sites first so I can proceed to design scripts to find unknown TF's. So ultimately I'd like to use perl to search both strands (forward and reverse comlement) to do this. Thanks.
ABC
|
......................... [ Privacy ]
|
| Posted Mar 14, 2007, 2:39 AM |
|
|
|
ABC
Group: Member Posts: 11 Joined: Mar 13, 2007
|
Are you familiar with, the $& function, I was told this may help?
|
......................... [ Privacy ]
|
| Posted Mar 14, 2007, 2:49 AM |
|
|
|
ryan_m
Group: Moderators Posts: 274 Joined: May 06, 2006
|
| ABC said: | Are you familiar with, the $& function, I was told this may help?
|
After applying your regex, $& stores the portion of the sequence that matched. So in the code I supplied you with, you could store all the $& in an array if you want to know the real sequence of the matching sites. $` and $' give you the left and right flanking sequences as well (in other words $` . $& . $` is your original sequence). However, considering what you say about discovering novel sites, I think you are looking at this in an overly-simplistic way. How does knowing how to match known regular expressions lead to a way to find novel ones? The identification of novel sites generally uses some sort of alignment of the promoter sites of co-regulated genes (using Gibbs sampling, for example). Ryan
|
.........................
|
| Posted Mar 14, 2007, 2:55 AM |
|
|
|
ABC
Group: Member Posts: 11 Joined: Mar 13, 2007
|
I'll be using my output to compare results obtained through Patser and MEME for example. Then seeing which program spit out sequences which actually are known to bind TF's. I will be using Gibbs etc for the novel sites, I tried to simply what it is I need (perl script) now. Thanks for the interest.
|
......................... [ Privacy ]
|
| Posted Mar 14, 2007, 18:02 PM |
|
|
|
|
top of page
|
 
|
Forum Jump
|
|