featureCounts is so fast that you simply try both strands on one of the BAM files. From R, just try:
library(Rsubread)
fc1 <- featureCounts(yourbamfile, annot.ext = whatever, isGTFAnnotationFile = whatever,
strandSpecific = 1, isPairedEnd = TRUE, nthreads = whatever)
fc2 <- featureCounts(yourbamfile, annot.ext = whatever, isGTFAnnotationFile = whatever,
strandSpecific = 2, isPairedEnd = TRUE, nthreads = whatever)
If both choices give roughly the same number of counts, then the sequencing was unstranded, in which case you should go back to strandSpecific=0
. If fc1 gives far more reads than fc2, then the sequencing was stranded. If fc2 gives far more reads, then the sequencing was reverse-stranded.
Will it work on BAM files that I obtained from HISAT2 with default option of strandedness, i.e. unstranded? Or STAR aligned BAM files (which does not consider the strandedness) can also be used?
Yes, the code I gave you will work on any BAM file. In general, strand information is used only when assigning read alignments to genes, not when aligning the reads to the reference genome in the first place. The BAM file stores only the alignment.
I tried with 2 paired-end BAM files from different GSE IDs. One of them gave 35149287 fc0 reads, 18809574 fc1 reads and 18645129 fc2 reads. The other BAM file gave 29345428 fc0 reads, 27236575 fc1 reads and 27239921 fc2 reads. For the second file, fc1 and fc2 give roughly the same reads while for the first file, although fc1 and fc2 reads are close enough, fc0 is approximately double of fc1 or fc2. Should I consider both of these files as unstranded?
Please post the complete
featureCounts
assignments rather than just read numbers. Which can be found in the.summary
file.It is the assigned reads that are important, not total reads. I will assume that you are reporting the number of assigned reads, although your comment does not specify.
The first file is clearly unstranded.
The results that you report from the other file are seemingly impossible if you have run the commands correctly and have provided a GTF file that contains strand information. Something is very wrong, but I cannot diagnose what that might be from the very limited information you have provided.
Yes, I mentioned the assigned reads from the featurecounts output. There was a mistake in the Hisat2 alignment command. The R2 pair files were not specified. So, it reported most of the reads as chimera. I corrected it and below is the complete featurecounts output:
OK now. Clearly the sequencing was unstranded. Just use featureCounts with default
strandSpecific=0
.