Question

Microbiome 16s sequencing raw data analysis

0

Entering edit mode

7.0 years ago

aidpranculis • 0

Hi,

I've received my microbiome sequencing results from uBiome and I am trying to analyze the raw reads using QIIME 1.9.1. The files are as follows:

ssr_100__R1__L001.fastq.gz, 
ssr_100__R1__L002.fastq.gz, 
ssr_100__R1__L003.fastq.gz, 
ssr_100__R1__L004.fastq.gz, 
ssr_100__R2__L001.fastq.gz, 
ssr_100__R2__L002.fastq.gz, 
ssr_100__R2__L003.fastq.gz, 
ssr_100__R2__L004.fastq.gz.

I have tried merging the R1 files and R2 files using cat and then use the join_paired_ends.py script, but only a fraction of reads get joined. Could anyone advise me on propper sequence of steps to analyze this data to perform OTU picking? Perhaps someone has a script to run the automated pipeline for the analysis?

Thanks,

microbiome 16s QIIME next-gen fastq • 4.2k views

ADD COMMENT • link updated 5.7 years ago by gtrwst9 • 0 • written 7.0 years ago by aidpranculis • 0

0

Entering edit mode

People usually "merge" paired end using tools like FLASH.

qiime provides detailed documents for otu-picking and down-stream analyis. e.g., http://nbviewer.jupyter.org/github/biocore/qiime/blob/1.9.1/examples/ipynb/illumina_overview_tutorial.ipynb

ADD REPLY • link 7.0 years ago by shenwei356 8.4k

0

Entering edit mode

the tutorial does not deal with multi lane, paired end demultiplexed samples. I am unsure in the multiple_split_libraries_fastq.py script performs well on this data.

ADD REPLY • link 7.0 years ago by aidpranculis • 0

0

Entering edit mode

What is the platform with which the reads were generated. You do not have to cat the read 1 and read 2 since according to the join_paired_ends.py, you have to do the following:

join_paired_ends.py -f $PWD/forward_reads.fastq -r $PWD/reverse_reads.fastq -o $PWD/fastq-join_joined

The script merges the paired end reads together. I think this way, you will see merged reads together. Run the qiime pipeline and see if you get good OTU's. If the results are not good, different trouble shooting steps can be done.

ADD REPLY • link 7.0 years ago by sridhar56 ▴ 110

0

Entering edit mode

I have tried using the script yet most of the reads are left unjoined. Any ideas on how to overcome this?

ADD REPLY • link 7.0 years ago by aidpranculis • 0

0

Entering edit mode

Have you tried other merge options,

This post seems to have many such options,

A link on Biostars

ADD REPLY • link 7.0 years ago by sridhar56 ▴ 110

0

Entering edit mode

all join*.py scripts produce the same results. Only the multiple_split_libraries_fastq.py produces a single .fna file, but it is a demultiplexing script so I am not sure how it deals with paired end and multi-lane .fastq files from a single sample.

ADD REPLY • link 7.0 years ago by aidpranculis • 0

score 0 · Answer 1 · 2018-07-31

This is taken from http://vegetablepharm.blogspot.com/2015/09/ubiome-data-analysis-using-mg-rast.html where Daniel Almonacid answered to a blogpost:

> ...At uBiome we amplify
> the V4 region of 16S rRNA which is on average 292bp (base pairs) long,
> and read with the Illumina machine 145-147bp from each end. When you
> consider each forward and reverse read from the same lane as
> independent reads, then you have sequences of only 145-147bp to map to
> known sequences, which may lead to several alternative genuses to
> which annotate a sequence to. Instead, if you use both reads from a
> lane as one single biological entity, the number of 16S sequences to
> which it maps it will be substantially reduced and thus more accurate.
> In some experiments we have performed, we have seen that annotating
> the same sample using single reads vs pair-end reads can lead to
> dramatically different phylogenetic annotations.

See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5414997/ for a paper by Almonacid where the company's pipeline is described in Methods.

However, I find that what you get from them has zero overlap (even 1bp missing in some cases, I checked three publicly available datasets) and so it's sufficient to reverse-complement the second strand and cat it to the first, after some QC and removal of a 12bp prefix from the first. See for example:

https://github.com/rwst/process-ubiome-16S