I am using StringTie v3.0.0 to discover novel transcripts from short-read RNA-seq data. After quantification, I noticed that some transcripts/exons have a coverage (cov) of 0, even though reads are aligned to these regions according to samtools view. Pipeline Used:
1-Transcript assembly per sample:
`stringtie -p 8 -c 2 -j 2 "$BAM_FILE" -G $GTF_FILE -o "$OUTPUT_GTF"`
2-Merging assembled transcripts:
`stringtie --merge $INPUT_DIR/*reconstructed.gtf -o $OUT_DIR/stringtie_merged_without_ref_output.gtf`
3-Quantification of transcripts:
`stringtie -p 12 -e -B $BAM_FILE -G $GTF_merged -o $OUTPUT_GTF/*_quantified.gtf`
Issue Observed:
For example, the transcript MSTRG.7.2 has zero coverage in t_data.ctab and _quantified.gtf despite read alignments. Results from grep in the .gtf file:
grep MSTRG.7.2 X2-WT_quantified.gtf
1 StringTie transcript 4561613 4567577 . - . gene_id "MSTRG.7"; transcript_id "MSTRG.7.2"; cov "0.0"; FPKM "0.000000"; TPM "0.000000";
1 StringTie exon 4561613 4562891 . - . gene_id "MSTRG.7"; transcript_id "MSTRG.7.2"; exon_number "1"; cov "0.0";
1 StringTie exon 4563995 4564086 . - . gene_id "MSTRG.7"; transcript_id "MSTRG.7.2"; exon_number "2"; cov "0.0";
1 StringTie exon 4565359 4566165 . - . gene_id "MSTRG.7"; transcript_id "MSTRG.7.2"; exon_number "3"; cov "0.0";
1 StringTie exon 4566514 4567577 . - . gene_id "MSTRG.7"; transcript_id "MSTRG.7.2"; exon_number "4"; cov "0.0";
Check in t_data.ctab:
grep MSTRG.7.2 t_data.ctab
t_id chr strand start end t_name num_exons length gene_id gene_name cov FPKM
9 1 - 4561613 4567577 MSTRG.7.2 4 3242 MSTRG.7 . 0.000000 0.000000
Check in e_data.ctab (exon coverage):
e_id chr strand start end rcount ucount mrcount cov cov_sd mcov mcov_sd
46 1 - 4561613 4562891 56 56 56.00 2.6654 1.8036 2.6654 1.8036
43 1 - 4563995 4564086 1 1 1.00 2.8043 1.5965 2.8043 1.5965
44 1 - 4565359 4566165 18 18 18.00 1.4597 1.3443 1.4597 1.3443
45 1 - 4566514 4567577 6 6 6.00 0.3327 0.5931 0.3327 0.5931
Read counts verification with samtools:
samtools view -c X2-WT_S21Aligned.sortedByCoord.out.bam 1:4561613-4562891
58
samtools view -c X2-WT_S21Aligned.sortedByCoord.out.bam 1:4563995-4564086
6
samtools view -c X2-WT_S21Aligned.sortedByCoord.out.bam 1:4565359-4566165
21
samtools view -c X2-WT_S21Aligned.sortedByCoord.out.bam 1:4566514-4567577
6
We can see that reads are aligned to these exons, but cov remains 0 in t_data.ctab and _quantified.gtf. Questions: Why does StringTie assign a coverage of 0 even though reads are present in samtools view?