Question: Cufflinks genome guided de novo transcript assembly- increase in ambiguous reads
gravatar for sunil.mangalam
5.2 years ago by
United States
sunil.mangalam10 wrote:

Hi forum,
I am doing denovo transcript assembly using cufflinks.In parellel, I am using featurecounts to count against Gencode M7. My data is 100bpPE and unstranded mouse RNASeq.
Our workflows are: STAR-Featurecounts (against Gencode M7) or
STAR-Cufflinks Denovo transcript assembly-cuffmerge - Featurecounts (against Denovo GTF file).
Comparing our counts from Gencode M7 vs Denovo GTF, I see a decrease (~40%) in Unassigned_NoFeatures reads in the data from denovo gtfs, which is encouraging because cufflinks is probably detecting new transcripts (or extending the exon boundaries of already known transcripts).
But at the same time, I find a huge increase (~ 400%) in reads that are unassigned_Ambiguity.This seems to have something to do with these samples being unstranded because when I align some 100bp SE stranded data, I get a decrease in both ambiguous and Unassigned_NoFearures reads, and an increase in total transcript assigned counts.
The code for cufflinks we used for each sample generally looks like this:
bsub -P ssTissue -n 12 -M 250000 -o /home/jtobias/ss/cuff_noM/logs/CV_CV1_cuff.out -e /home/jtobias/ss/cuff_noM/logs/CV_CV1_cuff.err cufflinks -p 8 --max-bundle-frags 300000 -q --library-type fr-unstranded -o /home/jtobias/ss/cuff_noM/CV_CV1 /home/ss/bam_noM/CV_CV1_noM.bam

and for feature counts against Gencode M7 or denovo gtf :
bsub -q max_mem30 -n 12 featureCounts -T 12 -t exon -g gene_id -a ~/m7/m7gtf.gtf -s 1 -o /home/ss/counts/SEgencodeGene.txt
Pretty much the standard options. Looks like we are loosing counts against known exons when we use denovo gtf. Has any one experienced this before? Any help is much appreciated!

denovo assembly rna-seq • 2.0k views
ADD COMMENTlink modified 5.2 years ago by andrew.j.skelton736.1k • written 5.2 years ago by sunil.mangalam10
gravatar for andrew.j.skelton73
5.2 years ago by
andrew.j.skelton736.1k wrote:

Why are you assembling denovo when there's a perfectly good reference genome? If you're looking for novel things, then cufflinks will do that for you, using the reference genome to inform it's decisions (as far as I can work out, at least). If you're missing transcripts being assembled over well known exons, then that's down to Cufflink's methodology on transcript assembly. Would it be worth comparing three cufflinks runs: denovo (as you've already done), against the reference genome with no novel detection, and against the reference genome with novel detection enabled?

ADD COMMENTlink modified 14 months ago by _r_am32k • written 5.2 years ago by andrew.j.skelton736.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2291 users visited in the last hour