Question

featureCounts: strandedness problem/inconsistency

0

Entering edit mode

3.4 years ago

NGS_enthusiast • 0

Hi all, I'm analyzing a paired-end RNA-seq from the litterature.I checked some previous posts that only partially answered the question. I used Hisat2 to align to mouse genome using the 2 fastq files from sequencing and used forward to specify strand information (reverse was giving me similar results.??? used galaxy for alignment) then I used reorderSam and then featureCounts. The problem is when I use s -1, s-2 or s-0 I usually get 2 close but different numbers for the stranded ones and for s -0 I get often the addition of both number. However sometimes, I get a totally different number. Some examples:

S0 S1 S2
110 53 57
4    2   2
337  233  230
1     67    60
1340    666    674

So I'm confused here as to which option I should choose. I tried to get my output Bam file obtained after hisat2 into IGV and it looks like it is not stranded (I'm not sure I got the right "view" but I had a lot of both red and blue on the same gene). So first, should I re-do my hisat2 by specifying "un-stranded"? Second, I should then use S0 I guess in my featureCounts, but why I have this line where I have only 1 alignment in S0 versus 67/60 with s1/s2? Why do I lose a lot for this gene? thanks for your help, best regards

RNA-seq • 646 views

ADD COMMENT • link 3.4 years ago by NGS_enthusiast • 0

0

Entering edit mode

great thanks for your answer. I tried for one of the samples and indeed hisat2 stranded or unstranded does not change the results of featureCounts. I will try your test but will then continue with just unstranded -s 0 option thanks.

ADD REPLY • link 3.4 years ago by NGS_enthusiast • 0

score 0 · Answer 1 · 2020-12-03

The most likely explanation for S0=1 S1=67 S2= 60 is that there are transcripts annotated on both strand on that region. By default, featureCounts filter out reads that map ambiguously, and this happens more often with the S0 option. My guess is that if you try to run featureCounts with the option -s 0 and -o (allowing for reads overlapping multiple features to be counted), you would get 60+67 reads for that particular feature (you can try that to verify the above hypothesis, but I do not recommend using those results for downstream analysis, because of read assignment uncertainty).

Regarding your other question, you do not need to re-do hisat2 in unstranded if you do not have strand information anyway.