Alignment dropped with Salmon
2
0
Entering edit mode
6.4 years ago
Sharon ▴ 600

Hi All

I have 3 samples that their alignment was around 70% using Tophat. But with Salmon the alignment dropped to 40 to 50%. The warning was 0.0043844% of fragments were shorter than the k used to build the index (31) And this showed up with samples with good alignment. I changed the k used, but the alignment still 40-50% What do you think could be the reason?

Thanks

RNA-Seq Salmon Tophat • 3.9k views
ADD COMMENT
2
Entering edit mode

There is a difference between alignment and mapping. You are "mapping" with Salmon.

Sharon : You should stop using TopHat for new projects and not use the results as a metric to compare either. STAR, BBMap, HISAT2 etc. are current RNAseq aligners or you could do mapping with salmon/kallisto. Pay particular attention to multi-mapping reads. Each aligner handles them differently. Check is you have rRNA contamination in your data as well.

ADD REPLY
0
Entering edit mode

Got it. Thanks genomax so much. Much appreciated !

ADD REPLY
1
Entering edit mode

As genomax mentions, if you map reads to the genome with a spliced mapper, but salmon doesnt map these reads to the transcriptome, tge big potential culprits are rRNA, intronic retention, or some other sort of genomic contamination. If it comes from an annotated transcript in the reference transcriptome, salmon should be able to map it if the other tools could.

ADD REPLY
0
Entering edit mode

Where did you get the transcript reference (fasta) from?

ADD REPLY
0
Entering edit mode
ADD REPLY
2
Entering edit mode
6.4 years ago
Sharon ▴ 600

Just for the sake of anyone going through the same problem, this is how the mapping increased. I build Salmon index in FMD mode rather than Quasi-mapping mode and the mapping increased from 40% to 73% which is close to tophat alignment.

So, this is how I rebuild the index:

   ${SALMON}/salmon index -t ${HumanTRANSCRIPTS} -i fmhuman-index --type fmd

What I was doing before is:

   ${SALMON}/salmon index --index human-index --type quasi --transcripts ${HumanTRANSCRIPTS}
ADD COMMENT
1
Entering edit mode

While it is good that you were able to bring those numbers up it would be interesting to see what that did to the count data.

ADD REPLY
0
Entering edit mode

Can you explain more please genomax? thanks

ADD REPLY
1
Entering edit mode

Did your compare the quantification files for the two runs (quant.sf, ambig_info.tsv)?

ADD REPLY
0
Entering edit mode

Ok, I will. So if they differ, on what bases I should choose any of them?

ADD REPLY
1
Entering edit mode

The other big difference between these modes is the minimum size of an exact match that can be used by default. What is the length of your reads? You could also attempt building the default (quasi) index with a smaller k value. For example -k 21 to see how that affects the mapping rate.

ADD REPLY
0
Entering edit mode

I see. Smaller indeces in quasi-mapping did not work too. That's the reason I tried fdm. The minimum read length I have is 20, and the warning of quasi mapping was very very low. Thanks genomax. Your comments are usually helpful. Much appreciated. Thanks Rob

ADD REPLY
0
Entering edit mode

@Rob: Are the defaults available in the documentation? Are there recommended use cases for the two methods?

ADD REPLY
1
Entering edit mode

Those defaults (k=19 for fmd and k=31 for quasi) are mentioned at the command line, but I'm not actually sure if they are in the documentation. We're preparing a new release, so I'll be sure to add them.

ADD REPLY
0
Entering edit mode

The defaults for fmd is not mentioned, just the defaults of quasi is what mentioned.
Thanks Rob.

ADD REPLY
0
Entering edit mode

I have found a similar increase in mapping rate (~50% to ~70%) if I use the FMD mode rather than the Quasi-mapping mode. However, looking at the meta_info.json files, I am actually getting less number of reads (num_reads) in FMD mode compared to Quasi-mapping mode. In Quasi-mapping mode, mapping rate = num_reads/num_processed, but this is not true for FMD mode?

ADD REPLY
1
Entering edit mode
6.4 years ago

The main reasons for the discrepancy is most likely that with TopHat you align against a genome whereas with Salmon you quantify against transcripts. The latter is much shorter and consist only of the known and expressed transcripts.

I don't think "mapping" is the right word when using a pseudo-aligner. Mapping reflects the coordinates. In contrast with Salmon the reads are assigned to transcripts.

ADD COMMENT

Login before adding your answer.

Traffic: 2351 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6