Question: Microbiome 16s sequencing raw data analysis
gravatar for aidpranculis
2.0 years ago by
aidpranculis0 wrote:


I've received my microbiome sequencing results from uBiome and I am trying to analyze the raw reads using QIIME 1.9.1. The files are as follows:


I have tried merging the R1 files and R2 files using cat and then use the script, but only a fraction of reads get joined. Could anyone advise me on propper sequence of steps to analyze this data to perform OTU picking? Perhaps someone has a script to run the automated pipeline for the analysis?


qiime fastq next-gen 16s microbiome • 1.7k views
ADD COMMENTlink modified 8 months ago by gtrwst90 • written 2.0 years ago by aidpranculis0

People usually "merge" paired end using tools like FLASH.

qiime provides detailed documents for otu-picking and down-stream analyis. e.g.,

ADD REPLYlink written 2.0 years ago by shenwei3564.5k

the tutorial does not deal with multi lane, paired end demultiplexed samples. I am unsure in the script performs well on this data.

ADD REPLYlink written 2.0 years ago by aidpranculis0

What is the platform with which the reads were generated. You do not have to cat the read 1 and read 2 since according to the, you have to do the following: -f $PWD/forward_reads.fastq -r $PWD/reverse_reads.fastq -o $PWD/fastq-join_joined

The script merges the paired end reads together. I think this way, you will see merged reads together. Run the qiime pipeline and see if you get good OTU's. If the results are not good, different trouble shooting steps can be done.

ADD REPLYlink written 2.0 years ago by sridhar56100

I have tried using the script yet most of the reads are left unjoined. Any ideas on how to overcome this?

ADD REPLYlink written 2.0 years ago by aidpranculis0

Have you tried other merge options,

This post seems to have many such options,

A link on Biostars

ADD REPLYlink written 2.0 years ago by sridhar56100

all join*.py scripts produce the same results. Only the produces a single .fna file, but it is a demultiplexing script so I am not sure how it deals with paired end and multi-lane .fastq files from a single sample.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by aidpranculis0
gravatar for gtrwst9
8 months ago by
gtrwst90 wrote:

This is taken from where Daniel Almonacid answered to a blogpost:

> ...At uBiome we amplify
> the V4 region of 16S rRNA which is on average 292bp (base pairs) long,
> and read with the Illumina machine 145-147bp from each end. When you
> consider each forward and reverse read from the same lane as
> independent reads, then you have sequences of only 145-147bp to map to
> known sequences, which may lead to several alternative genuses to
> which annotate a sequence to. Instead, if you use both reads from a
> lane as one single biological entity, the number of 16S sequences to
> which it maps it will be substantially reduced and thus more accurate.
> In some experiments we have performed, we have seen that annotating
> the same sample using single reads vs pair-end reads can lead to
> dramatically different phylogenetic annotations.

See for a paper by Almonacid where the company's pipeline is described in Methods.

However, I find that what you get from them has zero overlap (even 1bp missing in some cases, I checked three publicly available datasets) and so it's sufficient to reverse-complement the second strand and cat it to the first, after some QC and removal of a 12bp prefix from the first. See for example:

ADD COMMENTlink modified 8 months ago • written 8 months ago by gtrwst90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1300 users visited in the last hour