|
11/20/2008 04:54 PM
|
|
11/20/2008 04:13 PM
|
|
11/18/2008 01:26 AM
|
|
11/17/2008 02:35 PM
|
|
11/17/2008 11:06 AM
|
|
9/9/2008 12:51 PM
|
|
9/2/2008 08:27 PM
|
|
8/24/2008 12:16 PM
|
|
8/4/2008 11:04 PM
|
|
7/9/2008 07:31 PM
|
|
11/15/2007 03:40 PM
|
|
9/3/2007 08:27 PM
|
|
9/3/2007 08:44 PM
|
|
3/26/2007 08:47 PM
|
|
2/14/2007 06:23 AM
|
|
8/9/2006 09:39 PM
|
|
4/20/2006 02:38 PM
|
|
4/19/2006 01:19 PM
|
|
2/22/2006 10:56 PM
|
|
12/7/2005 01:37 PM
|
|
6/21/2005 03:27 PM
|
|
6/14/2005 04:13 PM
|
|
1/13/2005 10:32 PM
|
|
Transcription Factor motif (PERL) [View Printable]
|
ABC
Group: Member Posts: 11 Joined: Mar 13, 2007
|
I want to use perl to find transcription factor DNA binding sites. For instance CACTTGAN. I only have basic perl writing comprehension but can follow a script fairly well. Thanks.
ABC
|
......................... [ Privacy ]
|
Posted Mar 14, 2007, 0:50 AM |
|
|
|
ryan_m
Group: Moderators Posts: 280 Joined: May 06, 2006
|
This would be a rather crude way to search for TFBS's. There are many 'scanner' tools available that use PWWMs or PSSMs to find putative sites (and score them). I have used TRANSFAC's "match" tool as well as "patser" with some success. If you can get ahold of a PSSM for your transcription factor I would suggest scanners rather than strict regexes. Failing that, you could probably do what you need in a few lines of perl. Try what I have put below (keep in mind I haven't tested it).
#input is a fasta file of upstream sequences, you can get 1kb and 5kb upstream sequences pre-extracted from the UCSC genome browser download page
#!/usr/bin/perl use strict; use Bio::SeqIO;
my $pattern = "CACTTGA[ACTG]";
my $io = Bio::SeqIO->new(-file=>"upstream_1kb.fa",-format=>'fasta');
while(my $seq_obj = $io->next_seq){ my $full_sequence = uc($seq_obj->seq); if($full_sequence =~ /$pattern/ ){ #this sequence matches, do something with it my $id = $seq_obj->display_id; print "$id has match on (+) strand\n"; } my $rc = $seq_obj->revcom; my $rc_seq = uc($rc->seq); if($rc_seq =~ /$pattern/ ){ #this sequence matches, do something with it my $id = $seq_obj->display_id; print "$id has match on (-) strand\n"; } }
|
.........................
|
| Posted Mar 13, 2007, 22:13 PM |
|
|
|
surferchic
Group: Member Posts: 11 Joined: Dec 12, 2006
|
Ya..
Why do you want to use a script? Are you after TFBS that aren't in the databases already?
e.g. Transfac or Oreganno.
-SC
|
.........................
|
| Posted Mar 13, 2007, 22:17 PM |
|
|
|
ABC
Group: Member Posts: 11 Joined: Mar 13, 2007
|
I'm not partial to Perl at all....there are easier ways I know. I am learning to find known TF binding sites first so I can proceed to design scripts to find unknown TF's. So ultimately I'd like to use perl to search both strands (forward and reverse comlement) to do this. Thanks.
ABC
|
......................... [ Privacy ]
|
| Posted Mar 14, 2007, 2:39 AM |
|
|
|
ABC
Group: Member Posts: 11 Joined: Mar 13, 2007
|
Are you familiar with, the $& function, I was told this may help?
|
......................... [ Privacy ]
|
| Posted Mar 14, 2007, 2:49 AM |
|
|
|
ryan_m
Group: Moderators Posts: 280 Joined: May 06, 2006
|
| ABC said: | Are you familiar with, the $& function, I was told this may help?
|
After applying your regex, $& stores the portion of the sequence that matched. So in the code I supplied you with, you could store all the $& in an array if you want to know the real sequence of the matching sites. $` and $' give you the left and right flanking sequences as well (in other words $` . $& . $` is your original sequence). However, considering what you say about discovering novel sites, I think you are looking at this in an overly-simplistic way. How does knowing how to match known regular expressions lead to a way to find novel ones? The identification of novel sites generally uses some sort of alignment of the promoter sites of co-regulated genes (using Gibbs sampling, for example). Ryan
|
.........................
|
| Posted Mar 14, 2007, 2:55 AM |
|
|
|
ABC
Group: Member Posts: 11 Joined: Mar 13, 2007
|
I'll be using my output to compare results obtained through Patser and MEME for example. Then seeing which program spit out sequences which actually are known to bind TF's. I will be using Gibbs etc for the novel sites, I tried to simply what it is I need (perl script) now. Thanks for the interest.
|
......................... [ Privacy ]
|
| Posted Mar 14, 2007, 18:02 PM |
|
|
|
|
top of page
|
|
Forum Jump
|
|