Hi everyone,

I'm working on a study about soil metagenomics and needless to say I'm relatively new to all of this. I read from multiple sources that it is recommendable to merge the paired end files (MiSeq amplicon sequencing) with PEAR before assembling them with megahit.

I successfully merged the files with PEAR and go the following output:

  • Sample.assembled.fastq
  • Sample.discarded.fastq (always empty)
  • Sample.unassembled.forward.fastq
  • Sample.unassembled.reverse.fastq

So those are basically 1 single end and two paired end files, correct? Should I work with all three of them or can I discard the unassembled files?

The megahit syntax is the following:

megahit -1 a1.fq,b1.fq,c1.fq -2 a2.fq,b2.fq,c2.fq -r se1.fq,se2.fq -o out # 3 paired-end libraries + 2 SE libraries

So in my case:

megahit -r Sample.assembled.fastq -o out

Or should I include all 3 files?

megahit -1 Sample.unassembled.forward.fastq -2 Sample.unassembled.reverse.fastq -r Sample.assembled.fastq -o out

I would highly appreciate it if someone could shed some light on it. Thank you very much.

  1. Where did you read it? In my experience using the paired end or joining the reads doesn't matter.
  2. 2.

(MiSeq amplicon sequencing)

Amplicon? Of what? Why would amplicon sequence assemble? 3. How many reads do you have? Soil assembly is very very complex and requires billions of reads and a lot of resources.

ADD REPLYlink written 10 months ago by Asaf8.5k

It's 16S/18S/ITS amplicons with >10.000 reads.

I follow a workflow chart my supervisor gave me and I kept wondering about the assembly and how it would make sense. So I guess for microbial diversity profiling I don't need the assembly?!

What I did so far was to cut the adapters and merge the forward and reverse reads using PEAR. Can I already start clustering genes to OTUs with that?

ADD REPLYlink written 10 months ago by mattze7310

absolutely. I recommend/used vsearch and swarm for clustering. You can compare your workflow with Torbjørn Rognes example workflow

ADD REPLYlink written 10 months ago by Carambakaracho2.2k

Although I very much agree that this is the answer to the original question I want to add something just for the OP to continue his/hers research. Call it metabarcoding instead of metagenomics (Like the name you don't have genomes)

Also take a look at the unoise algorithm (implemented in VSEARCH and UNSEARCH) and DADA2. Also making use of an OTU table can make things easier.

ADD REPLYlink modified 10 months ago • written 10 months ago by gb1.9k

I second DADA2, plus it takes care of paired-end so no need for PEAR

ADD REPLYlink written 10 months ago by Asaf8.5k

DADA also has a merging step, so if you use the one from DADA or use PEAR it does not matter I think.

ADD REPLYlink written 10 months ago by gb1.9k
