This is taken from http://vegetablepharm.blogspot.com/2015/09/ubiome-data-analysis-using-mg-rast.html
where Daniel Almonacid answered to a blogpost:
> ...At uBiome we amplify
> the V4 region of 16S rRNA which is on average 292bp (base pairs) long,
> and read with the Illumina machine 145-147bp from each end. When you
> consider each forward and reverse read from the same lane as
> independent reads, then you have sequences of only 145-147bp to map to
> known sequences, which may lead to several alternative genuses to
> which annotate a sequence to. Instead, if you use both reads from a
> lane as one single biological entity, the number of 16S sequences to
> which it maps it will be substantially reduced and thus more accurate.
> In some experiments we have performed, we have seen that annotating
> the same sample using single reads vs pair-end reads can lead to
> dramatically different phylogenetic annotations.
See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5414997/ for a paper by
Almonacid where the company's pipeline is described in Methods.
However, I find that what you get from them has zero overlap (even 1bp
missing in some cases, I checked three publicly available datasets)
and so it's sufficient
to reverse-complement the second strand and cat it to the first, after some QC
and removal of a 12bp prefix from the first. See for example:
modified 8 months ago
8 months ago by
gtrwst9 • 0