Question

latest tool for genome guided assembly

0

Entering edit mode

8.5 years ago

Bioinfonext ▴ 470

I am not sure about which tool to use for genome guided assembly: cufflinks or stringTie

Assembly • 2.5k views

ADD COMMENT • link 8.5 years ago by Bioinfonext ▴ 470

0

Entering edit mode

I've never used StringTie, but I do recommend avoiding Cufflinks. However, you will get more useful responses if you provide more information about what you want to do, and what kind of data you have.

ADD REPLY • link 8.5 years ago by Brian Bushnell 20k

0

Entering edit mode

Its RNAseq data at different development stages from a plant species.

ADD REPLY • link 8.5 years ago by Bioinfonext ▴ 470

1

Entering edit mode

When asking for advice on assembly, it's useful to say something like...

"I have 20Gbp of 2x150bp Illumina reads, with target 270bp insert size, sequenced as strand-specific RNA-seq using protocol X on a HiSeq2500. This is from a tetraploid plant (P. bulbasaurus) with a genome of 2 Gbp and transcriptome of 100 Mbp, typical intron lengths of 200bp, and an estimated 30000 genes. The sample is wild-type rather than inbred, with a typical het rate of 1/300. Previous attempts at assembly used approach Y and yielded poor continuity with an L50 of 120 bp."

If you provide that kind of information, it's possible to give helpful suggestions.

ADD REPLY • link 8.5 years ago by Brian Bushnell 20k

0

Entering edit mode

Hi Brian,

I am having around 200 million pair end reads of 125 bp of Radish. Genome size 530 Mb consist of around 60,000 genes. Min. intron length 21 bp and maximum intron length 21000 bp. I need to estimate insert size from sam file using bbmap but I am having both mapped and unmapped reads in sam file so it not able to estimate and I think bbmap require only mapped.sam file.

Can you suggest how to extract only mapped reads from sam file format to sam format.

ADD REPLY • link 8.5 years ago by Bioinfonext ▴ 470

0

Entering edit mode

With BBMap, you can use "outm=mapped.sam" instead of "out=all.sam", to just get the mapped reads. Alternatively, once you already have a sam file, you can do this:

reformat.sh in=x.sam out=mapped_only.sam mappedonly primaryonly

You can also do similar filtering with samtools. But anyway - it's not clear to me why you need to extract the mapped reads; for your purposes, I would expect a "reference-guided assembler" to just ignore them. Typically, for DNA assembly, I would use a normal assembler like Spades; for RNA assembly, I'd suggest Trinity which is not perfect but seems to do a decent job. I am quite wary of reference-guided assembly as it can yield incorrect results when the sample differs substantially from the reference.

Also, if you want insert-size statistics, you can get those with BBMap using the flag "ihist=ihist.txt".

ADD REPLY • link 8.5 years ago by Brian Bushnell 20k

0

Entering edit mode

In case of doubt, try to find a review publication comparing those two tools, or just use both yourself and compare the results you get.

ADD REPLY • link 8.5 years ago by WouterDeCoster 48k