Should we keep only the uniquely mapped reads for gene expression
1
0
Entering edit mode
4 weeks ago
User000 ▴ 460

Hello,

I am doing the alignment of my RNA-seq paired-end reads with

a.HISAT2 --> stringtie --> DEseq2
b.STAR --> salmon --> DEseq2


Is it necessary to keep only the uniquely mapped reads before doing gene count?

samtools view -b -q 40 -o output.bam alignments.bam

RNA-seq • 295 views
3
Entering edit mode
4 weeks ago

No. Well, at least not for salmon, as half the point of it is that it deals with multimappers in an intelligent way via a modified expectation maximization algorithm. Most other reasonable gene quantification programs (e.g. RSEM, kallisto) will attempt to deal with multimappers as well. I haven't used stringtie in a very long time, so I don't remember what it does with them.

0
Entering edit mode

the paper says: if a fragment aligns in n places, than that fragment alignment will contribute 1/n to the edge capacity. But it is not clear to me at all. thanks for your reply

0
Entering edit mode

bbmap.sh will allow you to select a random location from among all locations where a read maps equally well (ambig=random option). I can't vouch for statistical validity of that approach bit it seems logical if you don't want to throw away multi-mapping data.

0
Entering edit mode

Yeah, Salmon won't work properly I don't think unless you keep multimappers in.

And I just wouldn't use stringtie for quantification. But if I were to, I'd exclude multimapping reads. I'm also not sure its possible to use DESeq2 on the results of StringTie. I think it outputs TPMs, rather than counts.

0
Entering edit mode

thanks for your replies. Stringtie outputs FPKM and I actually used it with DESeq2 as log(FPKM+1). For salmon output should I use tximportData to merge all counts (not TMPs?). Sorry I know it is another question.

1
Entering edit mode

Yes, follow the tximport vignette for salmon. It handles it quite painlessly.

0
Entering edit mode

You can't use log(FPKM+1) in DESeq2, it has to be read counts (it will run, but the results it produces will not be valid). Read counts are the only things that follow the negative binomial distribution used by DESeq2. log(FPKM+1) will instead follow something closer to a log-normal distribution.