Question

Closed:splitncigarreads + stringtie

0

Entering edit mode

6.5 years ago

sorrymouse ▴ 120

I am working with RNA-seq data from Drosophila, mapping with Hisat2 and using stringtie to reconstruct transcripts. It is non-model Drosophila.

Initially I did the assembly without the split 'n' cigar step with GATK. The output of stringtie then had multiple exons per transcript in the .gtf file, as follows:

transcript  18495   19529   1000    -   .   gene_id "STRG.7"; transcript_id "STRG.7.1"; cov "3.623000"; FPKM "4.150527"; TPM "4.448586";
exon    18495   18666   1000    -   .   gene_id "STRG.7"; transcript_id "STRG.7.1"; exon_number "1"; cov "1.837597";
exon    18740   18908   1000    -   .   gene_id "STRG.7"; transcript_id "STRG.7.1"; exon_number "2"; cov "5.701381";
exon    18996   19229   1000    -   .   gene_id "STRG.7"; transcript_id "STRG.7.1"; exon_number "3"; cov "3.265812";
exon    19278   19529   1000    -   .   gene_id "STRG.7"; transcript_id "STRG.7.1"; exon_number "4"; cov "3.779449";

Wanting to make sure my analysis was robust, I ran split 'n' cigar on the alignment files and redid the estimation of transcript abundance with stringtie.

Following this, only one exon was estimated per transcript, like so:

transcript  6866    7438    1000    +   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; cov "12.038646"; FPKM "2.925254"; TPM "2.443733";
exon    6866    7438    1000    +   .   gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "1"; cov "12.038646";
transcript  12315   12592   1000    +   .   gene_id "STRG.2"; transcript_id "STRG.2.1"; cov "2.542066"; FPKM "0.617693"; TPM "0.516016";
exon    12315   12592   1000    +   .   gene_id "STRG.2"; transcript_id "STRG.2.1"; exon_number "1"; cov "2.542066";

I think including split n cigar is an appropriate part of the workflow, but I must admit that intuitively having only one exon per transcript in the output file seems wrong. In addition, I do not understand how split 'n' cigar would cause this difference.

I'm pretty confused, if anybody has any insight into this issue, or more experience with what stringtie output should optimally be like, that would be most appreciated.

Thanks!

RNA-Seq GATK stringtie • 341 views

ADD COMMENT • link 6.5 years ago by sorrymouse ▴ 120