Transcription Factor motif (PERL)

8 posts / 0 new
Last post
ABC
ABC's picture
Transcription Factor motif (PERL)

I want to use perl to find transcription factor DNA binding sites. For instance CACTTGAN. I only have basic perl writing comprehension but can follow a script fairly well. Thanks.

ABC

jonatmudd
jonatmudd's picture
Hi ABC,

Hi ABC,
It's not clear to me what you want to do--do you simply want to detect whether a particular sequence appears?
Or do you want the position of where it occurs? Or.....?

For the simplest case, to find a sequence of say CACTTGAN, all you have to do is the following for simple string matching

$All_Base_Pairs =~ /CACTTGAN/

The regular expression above is case sensitive, so if your file containing all the base pairs is in small case, make sure you try to match small case, not upper case.

$All_Base_Pairs = "ACTTTAGGGCACTTGANACCTATACCTATGG";

(I just made up some sequence including the one you are looking for.)

You made need to do some simple file manipulation if you have the sequences stored in files and don't want to tediously cut and paste them in.

For now, the above will return true if the binding site you are looking for is present.

Hope that gets you started...

ABC
ABC's picture
I'd like the find the

I'd like the find the sequence and it's position. My infile is just bare capital letter bases no fasta format or other characters.

jonatmudd
jonatmudd's picture
Since you want to keep track

Since you want to keep track of position, you'll need to do a progressive match. Try something like:

my $sequence = INFILE #where INFILE is however you choose to input the sequence within which you are searching

my $pattern = \"[CG]CACTTGA[ATCG]\"
while ($sequence =~ $pattern/gi) {
printf "Found a sequence at %d\n", pos($sequence)-length($pattern);
}

ABC
ABC's picture
Thanks, how would I go about

Thanks, how would I go about printing the exact match since there is variation in the beginning and end bases?

jonatmudd
jonatmudd's picture
hi there-

hi there-
sorry for the slow reply.
i'm not a super expert at perl either, and that is a tricky question.

i know everyone loves perl because it is fast and free, but you might think about matlab. it has a very nice built in functions to do exactly what you want. probably only 3 lines of code. if you have matlab, interested and need help, let me know. i could rig it up no problem in a few minutes.

ABC
ABC's picture
I figured it out thanks for

I figured it out thanks for the assistance.

sichan
sichan's picture
A good book for basic Perl

A good book for basic Perl-ing for common bioinformatics tasks is James Tisdall's 'Beginning Perl for Bioinformatics.'
A slightly more advanced book would be its sequel, 'Mastering Perl for Bioinformatics', also written by Tisdall.