Help with bioinformatics for next-generation sequencing [View Printable]
|
ryan_m
Group: Moderators Posts: 284 Joined: May 06, 2006
|
I came across this forum (SeqAnswers) which focuses on the bioinformatics for handling/using data from next-generation sequencing platforms. Of course, anyone with problems or questions in that area could post here and find more than a few people working in this area who could probably advise them!
Ryan
|
.........................
|
| Posted Feb 18, 2008, 6:10 AM |
|
|
|
khan
Group: Member Posts: 2 Joined: Dec 01, 2007
|
I am new to SNP genotyping. If someone could help me to understand it for example if you discover 13000--15000 SNPs by EST sequences through 454 sequencing then how you genotype them on a population of 50 individuals. If you need to design 13000--15000 primers and which would be the best method of genotyping (cost effective)?
Thank you for your time and effort
|
.........................
|
| Posted Feb 21, 2008, 12:13 PM |
|
|
|
ryan_m
Group: Moderators Posts: 284 Joined: May 06, 2006
|
Hi Khan. Once you have your SNPs of interest, it is better to go to the highly parallel genotyping assays such as those provided by Illumina. There are many companies that will design and perform your arrays for you, see for example, this site.
Regards,
Ryan
|
.........................
|
| Posted Feb 21, 2008, 15:52 PM |
|
|
|
khan
Group: Member Posts: 2 Joined: Dec 01, 2007
|
Thanks for your reply. I will appreciate if you help me understanding how it works. I assume i found three thousand SNPs in EST sequences. Now i have to design three thosand primers and use them in multiplex in the assays like in CMMT genotyping assays? Thanks
|
.........................
|
| Posted Feb 21, 2008, 21:01 PM |
|
|
|
ryan_m
Group: Moderators Posts: 284 Joined: May 06, 2006
|
| khan said: | Thanks for your reply. I will appreciate if you help me understanding how it works. I assume i found three thousand SNPs in EST sequences. Now i have to design three thosand primers and use them in multiplex in the assays like in CMMT genotyping assays? Thanks |
That is the basic idea, yes. But many companies would design the probes for you, so probably all you would need to supply would be the positions of the polymorphisms and the two alleles.
|
.........................
|
| Posted Feb 21, 2008, 22:11 PM |
|
|
|
JayM
Group: Member Posts: 4 Joined: May 12, 2008
|
I have worked with 454 data for transcriptome analysis and SNPs, and working with that data is not a big problem. Now, the shift from 454 to solexa seems daunting (we recently acquired the sequencer), is there anyone out there who has an assembly software that they can recommend for me (for 454 I used Codoncode Aligner and it did what I wanted; problem is it has limitations on memory and RAM settings thus making it a problem for solexa).
I am looking for software that is robust enough to handle solexa data without having to stretch its capabilities too much.
|
.........................
|
| Posted May 12, 2008, 8:12 AM |
|
|
|
zee
Group: Member Posts: 1 Joined: Sep 03, 2008
|
We have written some software specifically for this. You can get it for free research/nonprofit use at www.novocraft.com
|
.........................
|
| Posted Sep 03, 2008, 3:54 AM |
|
|
|
ryan_m
Group: Moderators Posts: 284 Joined: May 06, 2006
|
| zee said: | | We have written some software specifically for this. You can get it for free research/nonprofit use at www.novocraft.com |
In my experience, novoalign can consume upwards of 15 gigabytes of RAM when mapping solexa reads to human, though you can change some parameters when creating your index to reduce this. For the people reading this thread, zee, could you let us know what the suggested lower RAM limit is for running novoalign with solexa reads against the human reference genome? Thanks, Ryan
|
.........................
|
| Posted Sep 04, 2008, 12:43 PM |
|
|
|
sparks
Group: Member Posts: 4 Joined: Sep 07, 2008
|
Hi Ryan, Using a 14-mer index with step of 3, the Human genome can be indexed in approx 6Gbyte of RAM and novoalign & novopaired will then run quite happily on a workstation with 8Gbyte of RAM. BY default novoindex will look at how much RAM a server has and then choose k-mer length and step size to give optimum performance on that server. If your server has 16Gb RAM that might mean building a 12Gbyte index, you can always specify k&s and have a 6GB index on a 16Gb server. Up to a limit (4^k < genome length / s) larger k and smaller s will improve performance. Colin
| ryan_m said: | | zee said: | | We have written some software specifically for this. You can get it for free research/nonprofit use at www.novocraft.com |
In my experience, novoalign can consume upwards of 15 gigabytes of RAM when mapping solexa reads to human, though you can change some parameters when creating your index to reduce this. For the people reading this thread, zee, could you let us know what the suggested lower RAM limit is for running novoalign with solexa reads against the human reference genome?
Thanks,
Ryan |
|
.........................
|
| Posted Sep 07, 2008, 20:26 PM |
|
|
|
ryan_m
Group: Moderators Posts: 284 Joined: May 06, 2006
|
Thanks for the details, Colin. And just to confirm, the parameters used when the index is created does not affect the quality of the results (i.e. result in some missed alignments), it just leads to increased runtime to complete the process, correct?
Thanks again. Ryan
|
.........................
|
| Posted Sep 07, 2008, 23:35 PM |
|
|
|
sparks
Group: Member Posts: 4 Joined: Sep 07, 2008
|
Hi Ryan, You're right, the index k-mer length and step size really only affect runtime performance. It shouldn't affect the alignment location of a read. Colin
|
.........................
|
| Posted Sep 08, 2008, 2:51 AM |
|
|
|
G_nome
Group: Member Posts: 11 Joined: Feb 13, 2007
|
Hi Sparks I have a dual quad-core MacPro (64-bit Xeon processors) with 16G of RAM. This should be able to run novoalign just fine on the human genome, but I am having trouble getting novoindex to run. It seems that novoindex 'thinks' it is on a 32-bit machine and is complaining about memory limitations. Here is a piece of the error output:
Error: Sequence Index cannot fit in available RAM Error: RAM available: 2048Mb Error: Minimum RAM req'd: 4027Mb
Is there some way to get around this? Have others out there had success running novoindex and/or novoalign on a Mac?
|
.........................
|
| Posted Oct 20, 2008, 16:57 PM |
|
|
|
sparks
Group: Member Posts: 4 Joined: Sep 07, 2008
|
Hi G_nome,
The problem is in determining how much memory is available. Can you tell me what version you are using.
If you specify the k &s parameters then it shouldn't be a problem. On a 16GByte server and using human reference genome you could use either -k14 -s1 or -k15 -s2
Best Regards, Colin
|
.........................
|
| Posted Oct 20, 2008, 20:41 PM |
|
|
|
G_nome
Group: Member Posts: 11 Joined: Feb 13, 2007
|
| sparks said: | Hi G_nome,
The problem is in determining how much memory is available. Can you tell me what version you are using.
If you specify the k &s parameters then it shouldn't be a problem. On a 16GByte server and using human reference genome you could use either -k14 -s1 or -k15 -s2
Best Regards, Colin |
Thank you for your fast reply, Colin. The novoindex version is 1.5. As per your suggestion, it seems to work OK with -k15 -s2. Regards, Sean
|
.........................
|
| Posted Oct 21, 2008, 12:17 PM |
|
|
|
G_nome
Group: Member Posts: 11 Joined: Feb 13, 2007
|
Hi Again. I am not getting alignments as quickly as I would have expected from the rough benchmarks (and comparisons to Maq and Eland). I started 8 novopaired jobs (on an 8-cpu machine) a week ago (one lane of data for each job). Some jobs are 42-bp reads and some are 76-bp reads. Currently, each job has aligned between 1 and 5 million reads. Each lane has about 20 million reads (10 million pairs), so it is looking like I have many more weeks to wait. Am I doing something wrong? The only non-default option I am using is (-Q 30), hoping that would provide a speed-up by ignoring low quality alignments. By the way, this is on a MacPro with 16G of memory.
I appreciate you help.
Sean
|
.........................
|
| Posted Nov 07, 2008, 11:04 AM |
|
|
|