Hello,
I want to quantify the abundance of reads mapping downstream of genes with kallisto. I have RNA seq data that contains reads arising from read-through transcription (transcription downstream of transcript 3' ends).
I use two different transcriptome files: One reference transcriptome (containing only the real, genic transcript sequences) One modified transcriptome, containing the exact same sequences + sequences of downstream regions
This means that the second, modifed transcriptome has the same genic target sequences + a number of intergenic target sequences.
My problem is:
For some samples the number of pseudo-aligned reads is higher when i use the non-modified transcriptome, despite the modified transcriptome contains the EXACT same sequences, just with a few other target sequences more. I wonder how this is possible, as both de Bruijn graphs contain the same target sequences, the number of pseudo-aligned reads should be equal or higher, not lower. I expected some of the reads, that originally map to genic target sequences when quantified with the non-modified transcriptome, to be aligned to intergenic regions, as the equivalence class of transcripts for this read might be extended with intergenic target sequences.
I double checked if my transcriptome files really contain the same sequences. I would be glad if someone could explain me how it is possible, that some reads cannot be aligned with my modified transcriptome, despite containing the same target sequences.
Thank you!
You are right! Thank you really much! :) Somehow i did not think about this.
Now that more k-mers are mappable, these are no longer ignored by Kallisto. The intersection of all k-compatibility classes is no longer guaranteed for reads that did previously pseudo-align.
I will have a look at Salmon's selective alignment approach!