1
3
Entering edit mode
4.7 years ago
Sharon ▴ 560

Hi All

I have some samples with low mapping in Salmon (40% and less) that have higher alignments in Tophat, and trying to troubleshoot.

I picked some of the unmapped reads (from writeunmapped salmon parameter) and Blat them to human.

Some have 2 or more matches with identity 99% to 100% And some have many many matches, I need to scroll the page down too much. Many of these matches are 100% and some range between 85% to 100% identity.

I looked also into the “ambig_info.tsv’ , found some records with 0 unique mapping and more than 100 ambiguous mapping, but couldn’t relate to those unmapped.

This is how one match of one of them look in one mate and the other:

ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO STRAND  START    END      SPAN

browser details YourSeq           22   100   122   151 100.0%    10   +   37378275  37378303     29
browser details YourSeq           22    52    74   151 100.0%    10   -   37378275  37378303     29


So why this is not counted as mapped for example? Any hint, clue?

Thanks

RNA-Seq Salmon Mapping • 2.7k views
1
Entering edit mode

Take a look at: Salmon very low mapping

Even if they are aligning/mapping then you don't want to use them while counting since you are not sure where the read came from.

bbmap.sh has an option of placing the read at one of all best locations (ambig=random). You could try using that option to recover some of this data.

0
Entering edit mode

 1.you mean I can fix this by using BBMap for alignment, then use the bam file from BBMap -if it generates one-  for alignment based
salmon quantification then feed to edgeR ?

2. What makes a read maps to many places like this? any preprocessing failure for example I can fix?

3. When tophat have higher alignment for this (yes not mapping but somehow), this means it picks an alignment for it by some way either
randomly or whatever way to handle this?  Thanks so much genomax.

1
Entering edit mode
1. You are not fixing it. It is one way of handling multi-mappers. Take a look at the ambig= options to see others. If you want to be strict about it then you throw the multi-mappers away/not count them.
You can use the BBMap generated alignment for featureCounts and then DESeq2/edgeR.

2. There are genes with multiple copies (e.g. rDNA repeat, there are ~400-500 copies in human genome). If your reads are short(er) there is a chance that they may spuriously map in multiple locations (besides the copy example).

3. TopHat will by default place a multi-mapper in up to 20 top spots where read aligns well (you can check on that number).

0
Entering edit mode

Got it and many many thanks for the explanation genomax. Much appreciated !

0
Entering edit mode

Another one with high span that doesn't map too:

browser details YourSeq          126     2   127   151 100.0%    17   +   43164303  43164428    126

browser details YourSeq          126     2   127   151 100.0%    17   -   43164302  43164427    126

2
Entering edit mode
4.7 years ago

Salmon maps reads to the transcriptome while TopHat aligns to the genome (assuming that you are using Tophat in the "normal" way). I suspect that if you examine the reads that map with TopHat but not with Salmon, some (perhaps large) proportion of them is mapping to intergenic and/or intronic regions, so those reads will not be mapped by Salmon. Taking a look in a browser at the BAM file from TopHat or using a tool like RNASeqQC can help quantify the intergenic/intronic reads.

0
Entering edit mode

Great, got that. thank you Sean so much.