Question: What Are You Using For A Reference Assembler?
1
gravatar for diltsjeri
7.1 years ago by
diltsjeri440
Chicago, IL
diltsjeri440 wrote:

I need some information on reference assemblers. What are you using? Which is the most preferable reference assembler?

reference • 2.5k views
ADD COMMENTlink written 7.1 years ago by diltsjeri440
1

What is a "reference assembler"? One that uses a reference genome, or one that you want to use as a reference for comparison with others?

ADD REPLYlink written 7.1 years ago by Neilfws48k

Also: similar question, same user: http://www.biostars.org/post/show/44956/ion-torrent-reference-assembly/. Best to avoid posting multiple, highly-similar questions.

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by Neilfws48k
2
gravatar for Lee Katz
7.1 years ago by
Lee Katz2.9k
Atlanta, GA
Lee Katz2.9k wrote:

AMOScmp-shortreads is working well for me, but it takes a bit longer.

ADD COMMENTlink written 7.1 years ago by Lee Katz2.9k

What files are you using to do the message file conversion with toAmos?

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by diltsjeri440

I have been using toAmos_new (in the new versions only) to convert the fastq to a bnk, and then I start on step 20 in amos using -s 20 so that I can trick it into starting on a bnk file instead of an afg file. It's buried in my script but I think it's something like

toAmos_new -Q run.fastq -t SANGER -b amos.bnk
AMOScmp-shortreads -s 20 amos

You'll need amos.1con and amos.bnk in the same directory for this to work. You can use "amos" or any other prefix, but it must be the same between files.

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by Lee Katz2.9k
1

What's the advantage to starting with a bnk file? Also, are the options you posted above new to toAmosnew? I'm starting a pipeline with ion torrent data, so all I have is an sff. I use sffextract, to get the fasta,qual, and xml and I was going to use toAmos to convert to afg, but should I not?

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by diltsjeri440

The only advantage is that toAmos_new can read in a fastq file and therefore you skip 1) converting to fasta/qual and then 2) converting to afg. Internally, AMOScmp converts first to a bnk anyway and doesn't use the afg anymore.

ADD REPLYlink written 7.1 years ago by Lee Katz2.9k

thanks! this is really helpful.

ADD REPLYlink written 7.1 years ago by diltsjeri440

After validating it, AMOScmp unfortunately does not perform as well as I thought it should. I had a few more false-positives than when I worked with bowtie2. Sorry to do this, but I withdraw this recommendation in favor of newer tools. BWA came in as a close second to bowtie2 and was still better than AMOScmp.

edit I mean AMOScmp-shortReads.

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by Lee Katz2.9k

Thanks for following up.

ADD REPLYlink written 7.1 years ago by diltsjeri440

Bowtie doesn't output contigs though correct? I need my reads to be assembled.

ADD REPLYlink written 7.1 years ago by diltsjeri440
1

You'll have to follow up with samtools and "vcfutils.pl vcf2fq"

I am writing a script to automate this step but it is not finalized yet.

ADD REPLYlink written 7.1 years ago by Lee Katz2.9k

My reference sequence (to be indexed) has ambiguous nucleotides. This is apparently not supported by bowtie. Have you ran into this problem and if so how did you work it? I can't just replace those V, H,etc with a nucleotide because it would make the alignment bias.

I noticed bowtie offers a -ntoa option on bowtie-build, but that just changes all N's to As. Wouldn't that create a bias? Also I have other nucleotide variables like V, H as stated above, which the option --ntoa wouldn't fix.

And I have some gaps :(

ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by diltsjeri440

replace all the ambiguous letters with N. You can do that with sed, or in a text editor like vim. And I'd use sed to get rid of - as well. Putting in an A will create a bias.

bwa will not crash on a genome like that. I'm pretty sure it will treat them all like N's.

ADD REPLYlink written 7.0 years ago by swbarnes25.8k

Bowtie won't take Ns, I wish it did ;(

ADD REPLYlink written 7.0 years ago by diltsjeri440
1
gravatar for Nikolay Vyahhi
7.1 years ago by
Nikolay Vyahhi1.2k
St. Petersburg, Russia
Nikolay Vyahhi1.2k wrote:
ADD COMMENTlink modified 7.1 years ago • written 7.1 years ago by Nikolay Vyahhi1.2k
2

Those are aligners. Assemblers are like programs like vevlet.

ADD REPLYlink written 7.1 years ago by swbarnes25.8k

Velvet is de novo assembler. If you need to assemble by reference, then you need aligner.

ADD REPLYlink written 7.1 years ago by Nikolay Vyahhi1.2k

This seems to be an ongoing debate on this forum.

ADD REPLYlink written 7.1 years ago by diltsjeri440

I need my reads to be assembled based on a reference. With tools like Bowtie and BWA I get the percentage aligned and I can see the aligned regions, but the reads are not being assembled based on the reference. I believe this is the difference between the two.

ADD REPLYlink written 7.1 years ago by diltsjeri440
1

After alignment, you can construct (assemble) consensus sequence from BAM/SAM-file using samtools: http://samtools.sourceforge.net/cns0.shtml

ADD REPLYlink written 7.0 years ago by Nikolay Vyahhi1.2k

How can we have the snps and Indel replaced, and have the uncovered regions of the genome, represented as a series of Ns in the consensus assembly?

ADD REPLYlink written 12 months ago by deepti1rao20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 936 users visited in the last hour