Question: EBI-metagenome - Unequal number of reads in introduction and taxonomy
gravatar for agata88
12 months ago by
agata88670 wrote:

Hi all!

I was downloading and testing metagenome sample stored at EBI Metagenomics.

Here is the Introduction:

And here is the taxonomy:

The sample name is ERS1069635, run ID: ERR1298503 and the title of experiment: 16s rRNA gene amplicon sequencing of 50 week-old mouse gut microbiota as performed on Illumina MiSeq and Oxford Nanopore MinION sequencer. (ERP014408).

During analysis I saw that total raw number of reads in fastq files (PE, paired-end) is 249583 in R1 file and 249583 in R2 file. When viewing taxonomy results stored in database for remaining sample I saw that the total number of raw reads is 402734 and that number is divided into taxonomy levels in further steps.

I have no idea how 249583 became 402734? Is this an error? Could anyone have a look at this experiment and give me a tip? Maybe it is something that need to be reported ...

I would appreciate for any help.

Best regards,


16s metagenom ebi • 341 views
ADD COMMENTlink modified 12 months ago by Istvan Albert ♦♦ 77k • written 12 months ago by agata88670

A complete guess but the pipeline description ( says that overlapping reads are first merged and then fed in to QC analysis. Therefore the number of initial reads are less than 2*249583.

ADD REPLYlink written 12 months ago by microfuge930

But since reads are merged it should NOT be more than 249583 reads total to process ... that's my opinion. Read from R1 is merged to read R2 and that is not 2 reads but 1 merged read...

ADD REPLYlink written 12 months ago by agata88670

Again my assumption but not all pairs get merged. A few which have overlaps get merged. So the output could be pair1+pair2+merged. But as Istvan says could be a reporting issue as well.

ADD REPLYlink written 12 months ago by microfuge930
gravatar for Istvan Albert
12 months ago by
Istvan Albert ♦♦ 77k
University Park, USA
Istvan Albert ♦♦ 77k wrote:

I think this might be a reporting issue (or inconsistency).

100 paired-end reads do correspond to 200 measurements where the measurements are not independent pairwise. A read pair may corresponds to the same DNA fragment - but they may still cover different regions of DNA.

Depending on the methods used to perform the classification, the two non-independent read pairs may still be used and classified separately. Hence each read may support the classification at a taxonomical therefore it makes sense reporting them independently even though these reads are linked pairwise.

ADD COMMENTlink modified 12 months ago • written 12 months ago by Istvan Albert ♦♦ 77k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 630 users visited in the last hour