I am doing denovo transcript assembly using cufflinks.In parellel, I am using featurecounts to count against Gencode M7. My data is 100bpPE and unstranded mouse RNASeq.
Our workflows are: STAR-Featurecounts (against Gencode M7) or
STAR-Cufflinks Denovo transcript assembly-cuffmerge - Featurecounts (against Denovo GTF file).
Comparing our counts from Gencode M7 vs Denovo GTF, I see a decrease (~40%) in Unassigned_NoFeatures reads in the data from denovo gtfs, which is encouraging because cufflinks is probably detecting new transcripts (or extending the exon boundaries of already known transcripts).
But at the same time, I find a huge increase (~ 400%) in reads that are unassigned_Ambiguity.This seems to have something to do with these samples being unstranded because when I align some 100bp SE stranded data, I get a decrease in both ambiguous and Unassigned_NoFearures reads, and an increase in total transcript assigned counts.
The code for cufflinks we used for each sample generally looks like this:
bsub -P ssTissue -n 12 -M 250000 -o /home/jtobias/ss/cuff_noM/logs/CV_CV1_cuff.out -e /home/jtobias/ss/cuff_noM/logs/CV_CV1_cuff.err cufflinks -p 8 --max-bundle-frags 300000 -q --library-type fr-unstranded -o /home/jtobias/ss/cuff_noM/CV_CV1 /home/ss/bam_noM/CV_CV1_noM.bam
and for feature counts against Gencode M7 or denovo gtf :
bsub -q max_mem30 -n 12 featureCounts -T 12 -t exon -g gene_id -a ~/m7/m7gtf.gtf -s 1 -o /home/ss/counts/SEgencodeGene.txt
Pretty much the standard options. Looks like we are loosing counts against known exons when we use denovo gtf. Has any one experienced this before? Any help is much appreciated!