Question: Pre-processing MiSeq Paired End data
0
gravatar for gaiusjaugustus
2.9 years ago by
United States
gaiusjaugustus110 wrote:

I have MiSeq data (fastq.gz format) that I am trying to preprocess for microbiome analyses.

The workflow I've come up with is the following:

  1. Join paired end reads
  2. Trim sequences to remove primers & barcodes
  3. Demultiplex
  4. Quality Filter

I have tools to do numbers 3 & 4 above (Qiime or mothur).  But I can't seem to get anything to work for part 1 & 2.

I have two questions:

1) Is the workflow above in the correct order?

2) Is there a semi-straight forward tool (decent documentation/workflow examples) to join my paired reads and & trim sequences?  So far, I've tried using fastqjoin, but haven't been able to figure out how to use it.  Mothur has a trim.seqs function, but I've been having issues with that, too.

If these are the best tools, I'll start a different topic for trying to get them to work.  I just don't want to spend hours trying to get something to work if it isn't the best way for a beginner to do it.  Thanks in advance.

 

 

miseq preprocessing • 4.4k views
ADD COMMENTlink modified 2.9 years ago by Gabriel R.2.4k • written 2.9 years ago by gaiusjaugustus110
2
gravatar for igor
2.9 years ago by
igor6.2k
United States
igor6.2k wrote:

QIIME provides a tool join_paired_ends.py specifically for joining reads.

This script takes forward and reverse Illumina reads and joins them using the method chosen. Will optionally create an updated index reads file containing index reads for the surviving joined paired end reads.

 

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by igor6.2k

I was able to run join_paired_ends.py with little issue.  Do I need to trim primer or barcode sequences before moving on?  If so, does Qiime have a script for this?  (I see it's built into split_libraries.py, but doesn't seem to be built into split_libraries_fastq.py).  If not, anything easy/straightforward you can recommend?

ADD REPLYlink written 2.9 years ago by gaiusjaugustus110

If you are trimming, you should trim before you join. If the reads have adapters on the end, they should not join successfully. However, if this is 16S, your fragments should be big enough and the reads should not run into adapter regions. If you have adapters, there is a problem with that fragment and it probably should be eliminated anyway.

ADD REPLYlink written 2.9 years ago by igor6.2k
1
gravatar for Gabriel R.
2.9 years ago by
Gabriel R.2.4k
Center for Geogenetik Københavns Universitet
Gabriel R.2.4k wrote:

I would recommend leeHom that does both read overlap and adapter trimming:
http://nar.oxfordjournals.org/content/42/18/e141

Then you can use deML to demultiplex:

http://bioinformatics.oxfordjournals.org/content/31/5/770.long

you should run leeHom then deML. I wrote both so let me know if it works out. Also, for sanity's sake, I would convert fastq to BAM : https://github.com/grenaud/BCL2BAM2FASTQ/tree/master/fastq2bam

QC filter, you could use :

https://github.com/grenaud/aLib/blob/master/pipeline/filterReads.cpp

We filter on expected # of mismatches given by the QC scores. We can also filter sequences with low-complexity.

Good luck, have fun !

 

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Gabriel R.2.4k
0
gravatar for karl.stamm
2.9 years ago by
karl.stamm3.2k
United States
karl.stamm3.2k wrote:

Don't join them. You don't know how many bases go between each read side.

ADD COMMENTlink written 2.9 years ago by karl.stamm3.2k

So, if I understand correctly, you are saying to run the samples (through Qiime or mothur or whatever) all as single reads?

What if the reads are supposed to be overlapping?

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by gaiusjaugustus110

Paired end sequencing are single reads with data on both ends. Sometimes they overlap, often they don't.  These distinctions matter for quantification or for mapping. Better to use tools that know how to handle paired end sequencing. I dont know about microbiome analyses, so I dont know if you should be trying to merge paired ends. In genome and transcriptome sequencing you would leave them as two reads, in separate files, so the tools can use them together. 

ADD REPLYlink written 2.9 years ago by karl.stamm3.2k

QIIME does not work with paired-end reads. If you are doing 16S sequencing, your paired-end reads are expected to overlap, so you should be able to convert them to single-end.

ADD REPLYlink written 2.9 years ago by igor6.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 768 users visited in the last hour