Question: EBI-metagenome - Unequal number of reads in introduction and taxonomy
gravatar for agata88
6 months ago by
agata88630 wrote:

Hi all!

I was downloading and testing metagenome sample stored at EBI Metagenomics.

Here is the Introduction:

And here is the taxonomy:

The sample name is ERS1069635, run ID: ERR1298503 and the title of experiment: 16s rRNA gene amplicon sequencing of 50 week-old mouse gut microbiota as performed on Illumina MiSeq and Oxford Nanopore MinION sequencer. (ERP014408).

During analysis I saw that total raw number of reads in fastq files (PE, paired-end) is 249583 in R1 file and 249583 in R2 file. When viewing taxonomy results stored in database for remaining sample I saw that the total number of raw reads is 402734 and that number is divided into taxonomy levels in further steps.

I have no idea how 249583 became 402734? Is this an error? Could anyone have a look at this experiment and give me a tip? Maybe it is something that need to be reported ...

I would appreciate for any help.

Best regards,


16s metagenom ebi • 220 views
ADD COMMENTlink modified 6 months ago by Istvan Albert ♦♦ 75k • written 6 months ago by agata88630

A complete guess but the pipeline description ( says that overlapping reads are first merged and then fed in to QC analysis. Therefore the number of initial reads are less than 2*249583.

ADD REPLYlink written 6 months ago by microfuge730

But since reads are merged it should NOT be more than 249583 reads total to process ... that's my opinion. Read from R1 is merged to read R2 and that is not 2 reads but 1 merged read...

ADD REPLYlink written 6 months ago by agata88630

Again my assumption but not all pairs get merged. A few which have overlaps get merged. So the output could be pair1+pair2+merged. But as Istvan says could be a reporting issue as well.

ADD REPLYlink written 6 months ago by microfuge730
gravatar for Istvan Albert
6 months ago by
Istvan Albert ♦♦ 75k
University Park, USA
Istvan Albert ♦♦ 75k wrote:

I think this might be a reporting issue (or inconsistency).

100 paired-end reads do correspond to 200 measurements where the measurements are not independent pairwise. A read pair may corresponds to the same DNA fragment - but they may still cover different regions of DNA.

Depending on the methods used to perform the classification, the two non-independent read pairs may still be used and classified separately. Hence each read may support the classification at a taxonomical therefore it makes sense reporting them independently even though these reads are linked pairwise.

ADD COMMENTlink modified 6 months ago • written 6 months ago by Istvan Albert ♦♦ 75k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 528 users visited in the last hour