how to find DNA binding motifs in a protein/DNA sequence?

6 posts / 0 new
Last post
Amritha Nair
Amritha Nair's picture
how to find DNA binding motifs in a protein/DNA sequence?

Hi
I need to find out if there are any DNA binding motifs in the gab1 cDNA/PROTEIN.... I am a complete novice at using bioinformatics tools and was wondering what tools i could use.... Could really use the help
Thanks

Edward Dougherty
Edward Dougherty's picture
For looking for DNA binding

For looking for DNA binding motifs of various transcription factors within DNA promoter sequences, try:

Transcription Factor Search at http://www.cbrc.jp/research/db/TFSEARCH.html

or

Cis Element Cluster Finder (Cister) at
http://zlab.bu.edu/~mfrith/cister.shtml

or

Web Promoter Scan Service at
http://www-bimas.cit.nih.gov/molbio/proscan/

Searching for DNA binding motifs within protein sequences is much more difficult due to the large variation between DNA binding motifs, but you can try Expasy (http://ca.expasy.org/tools/). Look under "primary structure analysis" and search by specific domain type (ie: coiled coil, leucine zipper, etc. though not all of the DNA binding motifs are represented here.)

Good luck.

Amritha Nair
Amritha Nair's picture
Hi

Hi
I tried using TFSEARCH... i fed in the DNA seq... how does one interpret the results... does it give u the similarity between ur query sequence and the known transcription factors in the database???
help please!!!!

Edward Dougherty
Edward Dougherty's picture
Amritha Nair wrote:Hi

Amritha Nair wrote:

Hi
I tried using TFSEARCH... i fed in the DNA seq... how does one interpret the results... does it give u the similarity between ur query sequence and the known transcription factors in the database???
help please!!!!

When you enter your DNA sequence, choose the proper matrix (ie. vertebrate, yeast, etc.). Your results should be similar to the example I have posted below.

CAGATTTGTT TATTTGTTTT TTACTAAGAC CTGCTCTTTC AGGTCTGTTG

------------>
M00131 HNF-3b 91.9

------------>
M00131 HNF-3b 89.0

In the above example,
"CAGATTTGTT TATTTGTTTT TTACTAAGAC CTGCTCTTTC AGGTCTGTTG"

is a region of the sequence you entered.

------------>
(These lines indicate the portion of the above
sequence that has homology to a known transcription factor binding site. In this example, "TTGTTTATT")

M00131 HNF-3b 91.9

(This part tells you the transcription factor for which there appears to be a binding site (HNF-3b) and gives the score (91.9). The score = 100.0 * ('weighted sum' - min) / (max - min).

Areas with multiple overlapping lines showing the same factor with a score close to 100 are very likely to bind the suggested factor. However, this does not take into account chromatin configuration, other overlapping sites which may be bound by other proteins, etc. so a high score is far from definitive, but is a good place to start.

Edward Dougherty
Edward Dougherty's picture
Sorry, in my last post the --

Sorry, in my last post the -------> lines do not line up where they should, but hopefully you get the idea.

R Bishop
R Bishop's picture
Guys Im moving this

Guys Im moving this discussion to the BioInformatics

Rb