Question: Denovo Assembly Of Paired And Mate Paired Reads
1
gravatar for sebabiokr
6.5 years ago by
sebabiokr10
sebabiokr10 wrote:

I have metagenomic Illumina data (HiSeq 101b reads- one paired-end, one 180b overlapped paired-end and two mate-pair (2-5k) lib). Can someone suggest/describe the best approach or a pipeline to do denovo assembly?

Thanks for all the suggestion. yes it is whole-genome "shotgun" metagenomic data from Illumina with 101bp paired reads. i have three libraries 1. 180bp overlapped paired library, 2. 2K mate-pair library, 3. 5K mate-pair library

I appreciate your suggestion

Thank you

assembly metagenomics denovo • 5.1k views
ADD COMMENTlink modified 6.5 years ago by Philipp Bayer6.4k • written 6.5 years ago by sebabiokr10

is it transcriptome data or whole-genome sequencing data ?

ADD REPLYlink written 6.5 years ago by biorepine1.4k

Sounds like this is "shotgun" metagenomic data, unless there is some confusion here; metatranscriptomic usually means transcriptome data. @sebabiokr can you clarify if you have one or two libraries?

ADD REPLYlink written 6.5 years ago by Josh Herr5.6k

The software NxTrim was recently released here to remove Nextera Mate Pair adapters and categorise reads according to the orientation implied by the adapter location:

https://github.com/sequencing/NxTrim

ADD REPLYlink written 4.8 years ago by 141341254653464453.5k
4
gravatar for Rahul Sharma
6.5 years ago by
Rahul Sharma600
Germany
Rahul Sharma600 wrote:

HI,

I would first assemble the reads using velvet or SOAPdenovo and then use MEGAN to see the %age of contigs mapping to different genomes after blast alignments. Then there are many nice assemblers for metagenomic studies: MetaVelvet, Met-AMOS, MAP (http://bioinfo.ctb.pku.edu.cn/MAP/). Please go through the literature regarding it, you will find many articles showing performance and benchmarking of these tools.

Best, Rahul

ADD COMMENTlink written 6.5 years ago by Rahul Sharma600

I'm not calling you out by any means, but metagenomic assembly is difficult to do and to interpret, so I think the first step after mating the paired ends is to identify each read before assembling reads. There are lots of ways to identify reads. A next step is to do an assembly, but depending on where the samples are coming from, it can be hard to get a grasp of how many contigs to expect and getting an idea via the total microbial diversity through read identification can be a good start.

ADD REPLYlink written 6.5 years ago by Josh Herr5.6k
1
gravatar for Josh Herr
6.5 years ago by
Josh Herr5.6k
University of Nebraska
Josh Herr5.6k wrote:

See this Assembly Illumina Paired End Reads. I assume you're just "pairing" your mate pairs and not assembling the metagenomic data into larger contigs. I think you should initially use individual reads; I would avoid assembling your metagenomic data into contigs until you get a better idea what organisms your data represents. You'll next want to identify them: using BLAST or any other number of platforms for metagenomic analysis. "Mapping" in a transcriptome sense doesn't work well for metagenomic data.

ADD COMMENTlink written 6.5 years ago by Josh Herr5.6k
1
gravatar for Philipp Bayer
6.5 years ago by
Philipp Bayer6.4k
Australia/Perth/UWA
Philipp Bayer6.4k wrote:

This paper might be of benefit to you: Assembling large, complex environmental metagenomes

They don't really incorporate mate-paired data, so I think it might be best to use some of their grouping/pre-assembly step and then switch over to ALLPATHS-LG, which uses short Illumina reads to generate contigs and then uses mate-paired data to group the assembled contigs and put them together.

ADD COMMENTlink written 6.5 years ago by Philipp Bayer6.4k

Thank you its really a great information for my data set... i will look my data in this way for assembly...

ADD REPLYlink written 6.5 years ago by sebabiokr10
0
gravatar for biorepine
6.5 years ago by
biorepine1.4k
Spain
biorepine1.4k wrote:

If I were you, I would start with this pipeline: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Especially this bowtie-tophat-cufflinks pipeline is very apt for illumina sequencing data in general.

ADD COMMENTlink modified 6.5 years ago by Istvan Albert ♦♦ 81k • written 6.5 years ago by biorepine1.4k
1

The OP doesn't specify that they have RNA-Seq data...

ADD REPLYlink written 6.5 years ago by Dan Gaston7.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 835 users visited in the last hour