I have a 3kb gene sequence for which I am aligning my 150 bp RNASeq reads with bowtie2. There are no known introns in the gene. The first 90% of the sequence has a relatively consistent coverage of 50-200 reads, but the last 300 bp or so has 10x the coverage as any of the rest of the gene.
My first suspicion was that this sequence may be duplicated elsewhere in the genome and thus reads from another genomic region are spuriously aligning to the gene I'm looking at. However, BLASTing this 300 bp sequence to the genome or to NCBI's full database results in only matches to my gene of interest.
The full gene has 41% GC, while the last 300 has 36% GC. This doesn't seem too terribly different to cause such an effect...
What are other likely explanations when you see this kind of heterogeneity in coverage?
Looking forward to learning from you all.