I need some information on reference assemblers. What are you using? Which is the most preferable reference assembler?
What is a "reference assembler"? One that uses a reference genome, or one that you want to use as a reference for comparison with others?
Also: similar question, same user: http://www.biostars.org/post/show/44956/ion-torrent-reference-assembly/. Best to avoid posting multiple, highly-similar questions.
AMOScmp-shortreads is working well for me, but it takes a bit longer.
What files are you using to do the message file conversion with toAmos?
I have been using toAmos_new (in the new versions only) to convert the fastq to a bnk, and then I start on step 20 in amos using -s 20 so that I can trick it into starting on a bnk file instead of an afg file. It's buried in my script but I think it's something like
toAmos_new -Q run.fastq -t SANGER -b amos.bnk
AMOScmp-shortreads -s 20 amos
You'll need amos.1con and amos.bnk in the same directory for this to work. You can use "amos" or any other prefix, but it must be the same between files.
What's the advantage to starting with a bnk file? Also, are the options you posted above new to toAmosnew? I'm starting a pipeline with ion torrent data, so all I have is an sff. I use sffextract, to get the fasta,qual, and xml and I was going to use toAmos to convert to afg, but should I not?
The only advantage is that toAmos_new can read in a fastq file and therefore you skip 1) converting to fasta/qual and then 2) converting to afg. Internally, AMOScmp converts first to a bnk anyway and doesn't use the afg anymore.
thanks! this is really helpful.
After validating it, AMOScmp unfortunately does not perform as well as I thought it should. I had a few more false-positives than when I worked with bowtie2. Sorry to do this, but I withdraw this recommendation in favor of newer tools. BWA came in as a close second to bowtie2 and was still better than AMOScmp.
edit I mean AMOScmp-shortReads.
Thanks for following up.
Bowtie doesn't output contigs though correct? I need my reads to be assembled.
You'll have to follow up with samtools and "vcfutils.pl vcf2fq"
I am writing a script to automate this step but it is not finalized yet.
My reference sequence (to be indexed) has ambiguous nucleotides. This is apparently not supported by bowtie. Have you ran into this problem and if so how did you work it? I can't just replace those V, H,etc with a nucleotide because it would make the alignment bias.
I noticed bowtie offers a -ntoa option on bowtie-build, but that just changes all N's to As. Wouldn't that create a bias? Also I have other nucleotide variables like V, H as stated above, which the option --ntoa wouldn't fix.
And I have some gaps :(
replace all the ambiguous letters with N. You can do that with sed, or in a text editor like vim. And I'd use sed to get rid of - as well. Putting in an A will create a bias.
bwa will not crash on a genome like that. I'm pretty sure it will treat them all like N's.
Bowtie won't take Ns, I wish it did ;(
Those are aligners. Assemblers are like programs like vevlet.
Velvet is de novo assembler. If you need to assemble by reference, then you need aligner.
This seems to be an ongoing debate on this forum.
I need my reads to be assembled based on a reference. With tools like Bowtie and BWA I get the percentage aligned and I can see the aligned regions, but the reads are not being assembled based on the reference. I believe this is the difference between the two.
After alignment, you can construct (assemble) consensus sequence from BAM/SAM-file using samtools: http://samtools.sourceforge.net/cns0.shtml
How can we have the snps and Indel replaced, and have the uncovered regions of the genome, represented as a series of Ns in the consensus assembly?