Question

EBI-metagenome - Unequal number of reads in introduction and taxonomy

0

Entering edit mode

8.0 years ago

agata88 ▴ 870

Hi all!

I was downloading and testing metagenome sample stored at EBI Metagenomics.

Here is the Introduction:

https://www.ebi.ac.uk/ena/data/view/ERR1298503

And here is the taxonomy:

https://www.ebi.ac.uk/metagenomics/projects/ERP014408/samples/ERS1069635/runs/ERR1298503/results/versions/3.0

The sample name is ERS1069635, run ID: ERR1298503 and the title of experiment: 16s rRNA gene amplicon sequencing of 50 week-old mouse gut microbiota as performed on Illumina MiSeq and Oxford Nanopore MinION sequencer. (ERP014408).

During analysis I saw that total raw number of reads in fastq files (PE, paired-end) is 249583 in R1 file and 249583 in R2 file. When viewing taxonomy results stored in database for remaining sample I saw that the total number of raw reads is 402734 and that number is divided into taxonomy levels in further steps.

I have no idea how 249583 became 402734? Is this an error? Could anyone have a look at this experiment and give me a tip? Maybe it is something that need to be reported ...

I would appreciate for any help.

Best regards,

Agata

metagenom 16S EBI • 2.0k views

ADD COMMENT • link updated 8.0 years ago by Istvan Albert 102k • written 8.0 years ago by agata88 ▴ 870

0

Entering edit mode

A complete guess but the pipeline description (https://www.ebi.ac.uk/metagenomics/pipelines/3.0) says that overlapping reads are first merged and then fed in to QC analysis. Therefore the number of initial reads are less than 2*249583.

ADD REPLY • link 8.0 years ago by microfuge ★ 2.0k

0

Entering edit mode

But since reads are merged it should NOT be more than 249583 reads total to process ... that's my opinion. Read from R1 is merged to read R2 and that is not 2 reads but 1 merged read...

ADD REPLY • link 8.0 years ago by agata88 ▴ 870

0

Entering edit mode

Again my assumption but not all pairs get merged. A few which have overlaps get merged. So the output could be pair1+pair2+merged. But as Istvan says could be a reporting issue as well.

ADD REPLY • link 8.0 years ago by microfuge ★ 2.0k

score 0 · Answer 1 · 2017-07-10

I think this might be a reporting issue (or inconsistency).

100 paired-end reads do correspond to 200 measurements where the measurements are not independent pairwise. A read pair may corresponds to the same DNA fragment - but they may still cover different regions of DNA.

Depending on the methods used to perform the classification, the two non-independent read pairs may still be used and classified separately. Hence each read may support the classification at a taxonomical therefore it makes sense reporting them independently even though these reads are linked pairwise.