Pre-processing MiSeq Paired End data
3
0
Entering edit mode
8.8 years ago

I have MiSeq data (fastq.gz format) that I am trying to preprocess for microbiome analyses.

The workflow I've come up with is the following:

  1. Join paired end reads
  2. Trim sequences to remove primers & barcodes
  3. Demultiplex
  4. Quality Filter

I have tools to do numbers 3 & 4 above (Qiime or mothur). But I can't seem to get anything to work for part 1 & 2.

I have two questions:

  1. Is the workflow above in the correct order?
  2. Is there a semi-straight forward tool (decent documentation/workflow examples) to join my paired reads and & trim sequences? So far, I've tried using fastqjoin, but haven't been able to figure out how to use it. Mothur has a trim.seqs function, but I've been having issues with that, too.

If these are the best tools, I'll start a different topic for trying to get them to work. I just don't want to spend hours trying to get something to work if it isn't the best way for a beginner to do it. Thanks in advance.

MiSeq • 6.7k views
ADD COMMENT
2
Entering edit mode
8.8 years ago
igor 13k

QIIME provides a tool join_paired_ends.py specifically for joining reads.

This script takes forward and reverse Illumina reads and joins them using the method chosen. Will optionally create an updated index reads file containing index reads for the surviving joined paired end reads.
ADD COMMENT
0
Entering edit mode

I was able to run join_paired_ends.py with little issue. Do I need to trim primer or barcode sequences before moving on? If so, does Qiime have a script for this? (I see it's built into split_libraries.py, but doesn't seem to be built into split_libraries_fastq.py). If not, anything easy/straightforward you can recommend?

ADD REPLY
0
Entering edit mode

If you are trimming, you should trim before you join. If the reads have adapters on the end, they should not join successfully. However, if this is 16S, your fragments should be big enough and the reads should not run into adapter regions. If you have adapters, there is a problem with that fragment and it probably should be eliminated anyway.

ADD REPLY
1
Entering edit mode
8.8 years ago
Gabriel R. ★ 2.9k

I would recommend leeHom that does both read overlap and adapter trimming: http://nar.oxfordjournals.org/content/42/18/e141

Then you can use deML to demultiplex: http://bioinformatics.oxfordjournals.org/content/31/5/770.long

You should run leeHom then deML. I wrote both so let me know if it works out. Also, for sanity's sake, I would convert fastq to BAM: https://github.com/grenaud/BCL2BAM2FASTQ/tree/master/fastq2bam

QC filter, you could use https://github.com/grenaud/aLib/blob/master/pipeline/filterReads.cpp

We filter on expected # of mismatches given by the QC scores. We can also filter sequences with low-complexity.

Good luck, have fun!

ADD COMMENT
0
Entering edit mode
8.8 years ago

Don't join them. You don't know how many bases go between each read side.

ADD COMMENT
0
Entering edit mode

So, if I understand correctly, you are saying to run the samples (through Qiime or mothur or whatever) all as single reads?

What if the reads are supposed to be overlapping?

ADD REPLY
0
Entering edit mode

Paired end sequencing are single reads with data on both ends. Sometimes they overlap, often they don't. These distinctions matter for quantification or for mapping. Better to use tools that know how to handle paired end sequencing. I don't know about microbiome analyses, so I don't know if you should be trying to merge paired ends. In genome and transcriptome sequencing you would leave them as two reads, in separate files, so the tools can use them together.

ADD REPLY
0
Entering edit mode

QIIME does not work with paired-end reads. If you are doing 16S sequencing, your paired-end reads are expected to overlap, so you should be able to convert them to single-end.

ADD REPLY

Login before adding your answer.

Traffic: 2555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6