Hello,
I am using STAR to map reads from paired-end mRNA sequencing.
I mapping stat look fine overall, but I had in all my samples about 15% "unmapped: too short:.
When looking into the unmapped reads, I discovered there are reads that do not map when I map them as paired-end. However, when I map R1 and R2 separately, they do map fine.
I just do not get why the pair then would not map, they have a full and perfect alignment and also the quality score for the bases is good.
Any idea?
Best, Jurgen
Thank you for your response. I did not do anything to the files and I also checked now if the files are in sync, and the seem to be.
Yes, I am using the standard options in STAR, it is very weird
Can you check for a few reads to see how far apart they are aligning? In general paired end reads are expected to align between 300-600 bp apart (if you have a standard library). Are they mapping on the same chromosome?
I noticed that the two reads actually map to two different genes, which makes it even more weird. It is a mosquito genome, and the reads where the genes align to are also on different scaffolds.
I just don't get why this is happening
Are the two genes close by or on different parts of same/different chromosome? Is it possible that some kind of chimeric inserts were created in your library prep? That may be the only explanation. Unless you have support with a lot of reads piling up in the same way (mates aligned to those two locations) it seems hard to imagine that those are legitimate translocations (are they known to happen in mosquito?).
Thank you for your reply!
The mosquito genome I work with (Aedes albopictus) is not super well annotated / does not have a good quality let's say. The genome is not divided in chromosomes, but in scaffolds. I looked now at a couple of read pairs, and I observed each time R1 and R2 align to different genes, present on different scaffolds, and most like (much) further apart than 300-600 bp.
I don't know if I can expect legitimate translocation in Aedes albopictus. But it is a good suggestions, I will have a look. (The mosquito strain I use if difference than the reference strains, so there could be differences there).
You suggested that maybe chimeric inserts were created during the library prep? Honestly I have no idea. I used the Illummina Truseq mRNA library prep kit. I might contact illumina to see if they know if this happens more often.
Do you know if these generations of chimeric inserts commonly occur during library prep?
Thank you so much for thinking along, I really appreciate
Based on that information it may perhaps be best to ignore these reads since you can't be sure if they are showing reality or some sort of chimerism. Proving that these reads are real may require some PCR etc to show if the piece of DNA is contiguous over the two alignments.
Generally chimeric inserts should not be formed in normal lib prep.
Yes I think indeed it is best to ignore these. Thanks for helping out!