Paired-end reads somehow counted twice?
Entering edit mode
5 weeks ago
Simon Ahn ▴ 10

Hi. I'm new in Bioinformatics and try to extract read counts from fastq files.

I compared my result with answer count matrix, and read counts are doubled.

enter image description here

(Left one is from the answer read count matrix, and right one is my result.)

I used these commands on ubuntu to get my result:

Could you please tell me what went wrong?

hisat2 -p 50 \
-x [ENSEMBL refrence file] \
-1 [fastq file_1] \
-2 [fastq file_2] \
-S [output file name].sam

samtools sort -@ 8 -o [output file name].bam [input file name].sam

featureCounts -p -T 10 -a [GTF file] \
-o [output file name] \
[input file name].bam

I think I didn't apply pair-end option in some commands but I couldn't figure out which one.

RNAseq raw-count fastq • 346 views
Entering edit mode
5 weeks ago
GenoMax 109k

Latest version of the featureCounts has an explicit option to count reads as pairs (--countReadPairs) for use with -p. You would also want to provide correct strandedness option in your command.

Entering edit mode

basically, it sounds like that they have tacitly changed how the tool operates and with most training materials become outdated, leading to bugs and inconsistencies down the line ...

I don't even understand this:

Release 2.0.2, 29 March 2021 New parameter '--countReadPairs' is added to featureCounts to explicitly specify that read pairs will be counted, and the '-p' option in featureCounts now only specifies if the input reads are paired end (it also implied that counting of read pairs would be performed in previous versions).

I kind of sound like in the past -p would count as pairs, now one needs to pass both -p --countReadPairs together.

But then what effect does -p alone have?

Entering edit mode

Problem solved thanks to you guys!

Specify that input data contain paired-end reads. featureCounts will terminate if the type of input reads (singleend or paired-end) is different from the specified type. To count fragments (instead of reads) for paired-end reads, the --countReadPairs parameter should also be specified.

According to the featurecounts manual, I should've put --countReadPairs to count a fragment (forward + backward for paired-end). That explains why my result was doubled. IMHO, putting only -p command makes run stop when I put wrong data type. Thanks a lot!


Login before adding your answer.

Traffic: 2384 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6