Question

Should we keep only the uniquely mapped reads for gene expression

0

Entering edit mode

2.8 years ago

User000 ▴ 690

Hello,

I am doing the alignment of my RNA-seq paired-end reads with

a.HISAT2 --> stringtie --> DEseq2 
b.STAR --> salmon --> DEseq2

Is it necessary to keep only the uniquely mapped reads before doing gene count?

samtools view -b -q 40 -o output.bam alignments.bam

RNA-seq • 1.7k views

ADD COMMENT • link updated 2.8 years ago by jared.andrews07 ★ 16k • written 2.8 years ago by User000 ▴ 690

score 3 · Answer 1 · 2021-06-23

3

Entering edit mode

2.8 years ago

jared.andrews07 ★ 16k

No. Well, at least not for salmon, as half the point of it is that it deals with multimappers in an intelligent way via a modified expectation maximization algorithm. Most other reasonable gene quantification programs (e.g. RSEM, kallisto) will attempt to deal with multimappers as well. I haven't used stringtie in a very long time, so I don't remember what it does with them.

ADD COMMENT • link 2.8 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

the paper says: if a fragment aligns in n places, than that fragment alignment will contribute 1/n to the edge capacity. But it is not clear to me at all. thanks for your reply

ADD REPLY • link 2.8 years ago by User000 ▴ 690

0

Entering edit mode

bbmap.sh will allow you to select a random location from among all locations where a read maps equally well (ambig=random option). I can't vouch for statistical validity of that approach bit it seems logical if you don't want to throw away multi-mapping data.

ADD REPLY • link 2.8 years ago by GenoMax 141k

0

Entering edit mode

Yeah, Salmon won't work properly I don't think unless you keep multimappers in.

And I just wouldn't use stringtie for quantification. But if I were to, I'd exclude multimapping reads. I'm also not sure its possible to use DESeq2 on the results of StringTie. I think it outputs TPMs, rather than counts.

ADD REPLY • link 2.8 years ago by i.sudbery 19k

0

Entering edit mode

thanks for your replies. Stringtie outputs FPKM and I actually used it with DESeq2 as log(FPKM+1). For salmon output should I use tximportData to merge all counts (not TMPs?). Sorry I know it is another question.

ADD REPLY • link 2.8 years ago by User000 ▴ 690

1

Entering edit mode

Yes, follow the tximport vignette for salmon. It handles it quite painlessly.

ADD REPLY • link 2.8 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

You can't use log(FPKM+1) in DESeq2, it has to be read counts (it will run, but the results it produces will not be valid). Read counts are the only things that follow the negative binomial distribution used by DESeq2. log(FPKM+1) will instead follow something closer to a log-normal distribution.

ADD REPLY • link 2.8 years ago by i.sudbery 19k