featureCounts paired end reads wrongly assigned on the single end mode
1
0
Entering edit mode
13 days ago
UserA • 0

Hello! I have RNAseq data from samples containing 2 to 3 bacterial strains each. For the analysis I performed FastQC, trimmomatic to remove adapters and then Bowtie2 to align the reads to the reference genomes. I have one bowtie2 output file per sample with the paired ends and another one with the singles. I want to make one output file per reference genome with featurecounts for the paired ends on one side and the singles on the other.

For the paired end I run this command : featureCounts -T 24 -p -M -g Parent -a $x -o${Output}/${genome}.txt${Samples}/${genome}_paired In output I get this: I don't understand why in the running featurecounts part assign them in single end mode ? when I look at one of the paired end input files I get this: 97320876 + 0 in total (QC-passed reads + QC-failed reads) 97320876 + 0 primary 0 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 0 + 0 primary duplicates 591739 + 0 mapped (0.61% : N/A) 591739 + 0 primary mapped (0.61% : N/A) 97320876 + 0 paired in sequencing 48660438 + 0 read1 48660438 + 0 read2 499606 + 0 properly paired (0.51% : N/A) 518434 + 0 with itself and mate mapped 73305 + 0 singletons (0.08% : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)  featureCounts • 454 views ADD COMMENT 1 Entering edit mode You need to include  --countReadPairs Count read pairs (fragments) instead of reads. This option is only applicable for paired-end reads.  along with  -p Specify that input data contain paired-end reads. To perform fragment counting (ie. counting read pairs), the '--countReadPairs' parameter should also be specified in addition to this parameter.  to count the read pairs. ADD REPLY 0 Entering edit mode Thanks a lot for the answer, from now on featurecounts takes well into account that they are paired reads. On the other hand, the output results give me very high values of : Unassigned_Unmapped and Unassigned_NoFeatures. For example : As I work with samples that contain several bacteria this does not really surprise me for the other reference annotation genomes as I have much less RNA corresponding to these strains in my samples. However, for this strain (which takes almost 70% of my samples) I also get low results which worries me more. for the strain that represents about 70% of my samples I have : and for another strain that represents like 1% I have : I'm doing this script : featureCounts -T 24 -p --countReadPairs -s 2 -g Parent -a$x -o ${Output}/${genome}.txt ${Samples}/${genome}_paired

I took the reference genome and annotation file from the same webpage on NCBI website, did the alignement with Bowtie2, converted sam to bam with samtools view and sorted the samples by read name with samtools sort.

1
Entering edit mode

If you are working with a mix of genomes then it is possible that your strain is only a small fraction of the total data.

You need to address what @swbarnes2 pointed out below. featureCounts only works with data that is aligned in first place. And if you have really poor alignment % then that needs to be addressed first.

0
Entering edit mode

For the bacteria representing ~70% (i'm talking to 70% because I made qPCR on the samples) of my sample I have this :

The result of 591739 + 0 mapped (0.61% : N/A) is from a bacteria representing like 1% of the sample

I thought that since this bacteria mapped well (95 to 98% depending on the sample) I would have better results for the featurecounts output with it, more than 1 to 8% succesfully assigned alignements.

1
Entering edit mode

If you are trying to make a bam file of the reads aligned to a single reference, don't do it like that. Subset the bam you have to just the reads aligning where you want them. Though since there might be some similar regions between all your bacteria, this pipeline might not be appropriate for what you ae doing. Neither Bowtie nor FeatureCounts is very smart about reads which might align to multiple places.

0
Entering edit mode

Have you looked at the alignments in IGV or a similar browser? If your reads are multi-mapping then those are going to be not counted by default. You may also have a lot of rRNA in your data so that would be something else to check on.

0
Entering edit mode

I haven't looked at the alignment on IGV, I'm very new to it, but I'll try! In Bowtie2 output for this strain which is the most present in my samples I have from 1 to 7% of reads that align more than once.

At the sequencing platform there was depletion of rRNA so I assumed it was ok on that side but I will check. Do you check this with IGV too?

1
Entering edit mode

Most the data appears to align one time so that looks reasonable. Do you get more assignments if you try -s 0 (non-stranded) option? Are you certain your libraries are "reverse stranded"?

0
Entering edit mode

I get a little more successfully assigned alignements with the -s 0 option. I'm not sure, I sent my samples to Illumina NovaSeq 6000 sequencing. I will ask the sequencing platform to be sure. I don't understand why I have a good percentage of alignement for this strain (between 80 and 95%) but a low percentage of successfully assigned alignements (between 2 and 9%).

0
Entering edit mode
10 days ago

591739 + 0 mapped (0.61% : N/A)

Unless you really expect this for some reason, forget about FeatureCounts until you fix this.