Question: Megahit assembly after merging files with PEAR
gravatar for mattze731
10 months ago by
mattze7310 wrote:

Hi everyone,

I'm working on a study about soil metagenomics and needless to say I'm relatively new to all of this. I read from multiple sources that it is recommendable to merge the paired end files (MiSeq amplicon sequencing) with PEAR before assembling them with megahit.

I successfully merged the files with PEAR and go the following output:

  • Sample.assembled.fastq
  • Sample.discarded.fastq (always empty)
  • Sample.unassembled.forward.fastq
  • Sample.unassembled.reverse.fastq

So those are basically 1 single end and two paired end files, correct? Should I work with all three of them or can I discard the unassembled files?

The megahit syntax is the following:

megahit -1 a1.fq,b1.fq,c1.fq -2 a2.fq,b2.fq,c2.fq -r se1.fq,se2.fq -o out # 3 paired-end libraries + 2 SE libraries

So in my case:

megahit -r Sample.assembled.fastq -o out

Or should I include all 3 files?

megahit -1 Sample.unassembled.forward.fastq -2 Sample.unassembled.reverse.fastq -r Sample.assembled.fastq -o out

I would highly appreciate it if someone could shed some light on it. Thank you very much.

sequencing next-gen assembly • 498 views
ADD COMMENTlink modified 10 months ago • written 10 months ago by mattze7310
  1. Where did you read it? In my experience using the paired end or joining the reads doesn't matter.
  2. 2.

(MiSeq amplicon sequencing)

Amplicon? Of what? Why would amplicon sequence assemble? 3. How many reads do you have? Soil assembly is very very complex and requires billions of reads and a lot of resources.

ADD REPLYlink written 10 months ago by Asaf8.5k

It's 16S/18S/ITS amplicons with >10.000 reads.

I follow a workflow chart my supervisor gave me and I kept wondering about the assembly and how it would make sense. So I guess for microbial diversity profiling I don't need the assembly?!

What I did so far was to cut the adapters and merge the forward and reverse reads using PEAR. Can I already start clustering genes to OTUs with that?

ADD REPLYlink written 10 months ago by mattze7310

absolutely. I recommend/used vsearch and swarm for clustering. You can compare your workflow with Torbjørn Rognes example workflow

ADD REPLYlink written 10 months ago by Carambakaracho2.2k

Although I very much agree that this is the answer to the original question I want to add something just for the OP to continue his/hers research. Call it metabarcoding instead of metagenomics (Like the name you don't have genomes)

Also take a look at the unoise algorithm (implemented in VSEARCH and UNSEARCH) and DADA2. Also making use of an OTU table can make things easier.

ADD REPLYlink modified 10 months ago • written 10 months ago by gb1.9k

I second DADA2, plus it takes care of paired-end so no need for PEAR

ADD REPLYlink written 10 months ago by Asaf8.5k

DADA also has a merging step, so if you use the one from DADA or use PEAR it does not matter I think.

ADD REPLYlink written 10 months ago by gb1.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2307 users visited in the last hour