Qualimap bamqc with very high N%
1
0
Entering edit mode
8 months ago
Priyanka ▴ 10

enter image description here

I am running qualimap bamqc on some human RNAseq samples and I am seeing very high N content in my reads. I have done proper qc of my data and I am not sure why is there such high N content being shown in the bam. If anyone can advice as to why I am seeing this high N content and how to interpret this?

This is my chromosome wise coverage

qualimap • 1.1k views
ADD COMMENT
0
Entering edit mode

how could you have 435% Ns? The percentages for A,T,G,C add up to almost 100% as well

seems like something is wrong with these numbers

ADD REPLY
0
Entering edit mode

Yes, I also found it very strange. Is it an error of the tool or something else, I am not sure.

ADD REPLY
0
Entering edit mode

I have done proper qc of my data

What did that include?

Generally there should be no N's in data you receive from sequencing now a days since technical aspects are well worked out. N's indicate some sort of issue (hardware/software/libraries) with run. ~12 Billion N's seems rather high.

ADD REPLY
0
Entering edit mode

I have trimmed low quality bases, adapter content, checked reads quality using fastqc post that to ensure quality of all the reads being used further downstream for alignment

enter image description here

This is my fastqc report for this sample.

enter image description here

ADD REPLY
0
Entering edit mode

fastqc has another plot that shows the sequence composition of the reads, that would show you if you have Ns

ADD REPLY
1
Entering edit mode

There are no significant N's (plot is at top) so the N's in qualimap must be coming from CIGAR as you predict.

ADD REPLY
0
Entering edit mode

Thank you so much for the help.

ADD REPLY
0
Entering edit mode
8 months ago

perhaps this qualimap tool counts the Ns in the CIGAR string

Depending on the aligner the CIGAR string can have Ns in it indicating a spliced alignments (an aligner can put Ns where over the intronic regions)

in which case the N is pretty much meaningless in the context of quality

ADD COMMENT
0
Entering edit mode

I have used STAR aligner for mapping. Does that have to do anything with this N %?

ADD REPLY
1
Entering edit mode

As I mentioned before the letter N can stand for two different things.

  1. An N in the FASTQ file (ambiguous base)
  2. or an N in the CIGAR string that means intronic region.

The first type of N is a problem the second kind of N is not.

You most likely have the second kind of Ns.

ADD REPLY
0
Entering edit mode

Thank you so much for the help. It is a whole transcriptome bulk RNAseq so it wont be unexpected to have presence of intronic reads.

ADD REPLY

Login before adding your answer.

Traffic: 2941 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6