|
|
|
Transcription Factor motif (PERL) [View Printable]
|
ABC
Group: Member Posts: 11 Joined: Mar 13, 2007
|
I want to use perl to find transcription factor DNA binding sites. For instance CACTTGAN. I only have basic perl writing comprehension but can follow a script fairly well. Thanks.
ABC
|
......................... [ Privacy ]
|
Posted Mar 14, 2007, 0:45 AM |
|
|
|
jonatmudd
Group: Member Posts: 28 Joined: Oct 07, 2005
|
Hi ABC, It's not clear to me what you want to do--do you simply want to detect whether a particular sequence appears? Or do you want the position of where it occurs? Or.....?
For the simplest case, to find a sequence of say CACTTGAN, all you have to do is the following for simple string matching
$All_Base_Pairs =~ /CACTTGAN/
The regular expression above is case sensitive, so if your file containing all the base pairs is in small case, make sure you try to match small case, not upper case.
$All_Base_Pairs = "ACTTTAGGGCACTTGANACCTATACCTATGG";
(I just made up some sequence including the one you are looking for.)
You made need to do some simple file manipulation if you have the sequences stored in files and don't want to tediously cut and paste them in.
For now, the above will return true if the binding site you are looking for is present.
Hope that gets you started...
|
.........................
|
| Posted Mar 16, 2007, 16:00 PM |
|
|
|
ABC
Group: Member Posts: 11 Joined: Mar 13, 2007
|
I'd like the find the sequence and it's position. My infile is just bare capital letter bases no fasta format or other characters.
|
......................... [ Privacy ]
|
| Posted Mar 16, 2007, 20:22 PM |
|
|
|
jonatmudd
Group: Member Posts: 28 Joined: Oct 07, 2005
|
Since you want to keep track of position, you'll need to do a progressive match. Try something like:
my $sequence = INFILE #where INFILE is however you choose to input the sequence within which you are searching
my $pattern = \"[CG]CACTTGA[ATCG]\" while ($sequence =~ $pattern/gi) { printf "Found a sequence at %d\n", pos($sequence)-length($pattern); }
|
.........................
|
| Posted Mar 16, 2007, 20:58 PM |
|
|
|
ABC
Group: Member Posts: 11 Joined: Mar 13, 2007
|
Thanks, how would I go about printing the exact match since there is variation in the beginning and end bases?
|
......................... [ Privacy ]
|
| Posted Mar 17, 2007, 0:23 AM |
|
|
|
jonatmudd
Group: Member Posts: 28 Joined: Oct 07, 2005
|
hi there- sorry for the slow reply. i'm not a super expert at perl either, and that is a tricky question.
i know everyone loves perl because it is fast and free, but you might think about matlab. it has a very nice built in functions to do exactly what you want. probably only 3 lines of code. if you have matlab, interested and need help, let me know. i could rig it up no problem in a few minutes.
|
.........................
|
| Posted Mar 20, 2007, 21:50 PM |
|
|
|
ABC
Group: Member Posts: 11 Joined: Mar 13, 2007
|
I figured it out thanks for the assistance.
|
......................... [ Privacy ]
|
| Posted Mar 21, 2007, 16:34 PM |
|
|
|
sichan
Group: Member Posts: 22 Joined: Jul 30, 2008
|
A good book for basic Perl-ing for common bioinformatics tasks is James Tisdall's 'Beginning Perl for Bioinformatics.' A slightly more advanced book would be its sequel, 'Mastering Perl for Bioinformatics', also written by Tisdall.
|
.........................
|
| Posted Aug 04, 2008, 23:12 PM |
|
|
|
|
top of page
|
 
|
Forum Jump
|
|