Question: Assembly Illumina Paired End Reads
gravatar for vijay
6.4 years ago by
vijay1.5k wrote:

Which would be the best tool to assemble paired end reads generated by Illumina?


next-gen • 6.4k views
ADD COMMENTlink modified 3.1 years ago by jigarnt30 • written 6.4 years ago by vijay1.5k

People can help you better when you give some more information. DNA or RNA? Which species? How much RAM do you have? What is more important, contig accuracy or contiguity? Read lenght, insert size, total amount of reads? Do you suspect DNA contamination from other species?

ADD REPLYlink written 6.4 years ago by Irsan6.8k

This is a metagenome sample. Hence I can't be sure of the number of species, since I am yet to recieve my sequence data. Just on a preparatory note I wanted to know this. I would need contiguity since this is a metagenome. read length would be app. 150bp

ADD REPLYlink written 6.4 years ago by vijay1.5k
gravatar for Josh Herr
6.4 years ago by
Josh Herr5.6k
University of Nebraska
Josh Herr5.6k wrote:

This is for metagenomics and we're not assembling the reads yet, correct? Amplicon data? You didn't tell us. There's no need to assemble the reads yet, you are just looking to mate the paired-end sequences from your library? I think the terminology is confusing and I prefer "mate" when combining paired-end data over "assembly" as one would do after your paired-end data is matched up and you are looking to make contig sequences from your data. If you have amplicon data (16S, 18S, ITS, etc.) then you can make consensus sequences, but this is not assembly in my opinion.

You didn't give us any information on the technology, but I am assuming from the 150 bp size that this is Illumina data and in FASTQ format?

Here's a previous SEQanswers thread and Best Way To Preprocess Barcoded Illumina Paired-End Data on this topic. There are a couple of options for mating Illumina paired-end data: I have used FastqJoin, PANDAseq, and CLC bio, but I am sure there are many other options out there.

ADD COMMENTlink written 6.4 years ago by Josh Herr5.6k
gravatar for Philipp Bayer
6.4 years ago by
Philipp Bayer6.0k
Philipp Bayer6.0k wrote:

There are some papers comparing different assembles, I'd look at their result-tables and choose what fits best for your data (hard to tell over here)



and for fun, here's another review: Assembly of large genomes using second-generation sequencing

ADD COMMENTlink written 6.4 years ago by Philipp Bayer6.0k
gravatar for jigarnt
3.1 years ago by
jigarnt30 wrote:

*Getting this error: /Users/lindakohn/Desktop/tools/SPAdes-3.7.1-Darwin/bin/ -k 21,33,55,77 --careful --only-assembler --pe<#>-12 <euro_plasmid_r1_paired.fastq euro_plasmid_r2_paired.fastq=""> --pe<#>-s1 <euro_plasmid_r1_unpaired.fastq> --pe<#>-s2 <euro_plasmid_r2_unpaired.fastq> -o Euro_plasmid_spades_output

-bash: syntax error near unexpected token `newline' what is wrong with the command?**

ADD COMMENTlink written 3.1 years ago by jigarnt30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1076 users visited in the last hour