Question: latest tool for genome guided assembly
gravatar for Bioinfonext
3.7 years ago by
Bioinfonext250 wrote:

I am not sure about which tool to use for genome guided assembly: cufflinks or stringTie

assembly • 1.5k views
ADD COMMENTlink written 3.7 years ago by Bioinfonext250

I've never used StringTie, but I do recommend avoiding Cufflinks. However, you will get more useful responses if you provide more information about what you want to do, and what kind of data you have.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Brian Bushnell17k

Its RNAseq data at different development stages from a plant species.

ADD REPLYlink written 3.7 years ago by Bioinfonext250

When asking for advice on assembly, it's useful to say something like...

"I have 20Gbp of 2x150bp Illumina reads, with target 270bp insert size, sequenced as strand-specific RNA-seq using protocol X on a HiSeq2500. This is from a tetraploid plant (P. bulbasaurus) with a genome of 2 Gbp and transcriptome of 100 Mbp, typical intron lengths of 200bp, and an estimated 30000 genes. The sample is wild-type rather than inbred, with a typical het rate of 1/300. Previous attempts at assembly used approach Y and yielded poor continuity with an L50 of 120 bp."

If you provide that kind of information, it's possible to give helpful suggestions.

ADD REPLYlink written 3.7 years ago by Brian Bushnell17k

Hi Brian,

I am having around 200 million pair end reads of 125 bp of Radish. Genome size 530 Mb consist of around 60,000 genes. Min. intron length 21 bp and maximum intron length 21000 bp. I need to estimate insert size from sam file using bbmap but I am having both mapped and unmapped reads in sam file so it not able to estimate and I think bbmap require only mapped.sam file.

Can you suggest how to extract only mapped reads from sam file format to sam format.

ADD REPLYlink written 3.7 years ago by Bioinfonext250

With BBMap, you can use "outm=mapped.sam" instead of "out=all.sam", to just get the mapped reads. Alternatively, once you already have a sam file, you can do this: in=x.sam out=mapped_only.sam mappedonly primaryonly

You can also do similar filtering with samtools. But anyway - it's not clear to me why you need to extract the mapped reads; for your purposes, I would expect a "reference-guided assembler" to just ignore them. Typically, for DNA assembly, I would use a normal assembler like Spades; for RNA assembly, I'd suggest Trinity which is not perfect but seems to do a decent job. I am quite wary of reference-guided assembly as it can yield incorrect results when the sample differs substantially from the reference.

Also, if you want insert-size statistics, you can get those with BBMap using the flag "ihist=ihist.txt".

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Brian Bushnell17k

In case of doubt, try to find a review publication comparing those two tools, or just use both yourself and compare the results you get.

ADD REPLYlink written 3.7 years ago by WouterDeCoster44k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1635 users visited in the last hour