I've been exploring some STAR-aligned tumor RNA-Seq data in IGV, and noticed that coverage for either of the two reference coding transcripts of one particular gene (per Ensembl) is low throughout (e.g. some exons with 2 reads, some with 0) except for ~100 depth for one exon at the end of the gene.
Any thoughts on how I should interpret and further analyze this? Is this likely to be due to bias in the sequencing or is it suggesting that only that particular exon is expressed?
Relatedly, when I run Kallisto using --genomebam and view the pseudoalignments in IGV, I see a similar distribution except the depths below these exons are multiplied by an order of magnitude (thousands instead of ~100 reads overlapping that final exon). I'm trying to figure out that order of magnitude difference.
I'd appreciate any insight into either of the above questions. Thanks so much!
What library prep kit was used? There are 3' biased libraries which would result in what you see.
Thank you, that's a great point -- I'm not sure about the library prep, yet. I do see that Picard's
MEDIAN_3PRIME_BIASis 0.85 andMEDIAN_5PRIME_TO_3PRIME_BIASis 0.08, and if I'm interpreting these metrics correctly the latter seems to confirm 3' bias throughout.