Question

Megahit assembly after merging files with PEAR

0

Entering edit mode

4.2 years ago

mattze731 ▴ 20

Hi everyone,

I'm working on a study about soil metagenomics and needless to say I'm relatively new to all of this. I read from multiple sources that it is recommendable to merge the paired end files (MiSeq amplicon sequencing) with PEAR before assembling them with megahit.

I successfully merged the files with PEAR and go the following output:

Sample.assembled.fastq
Sample.discarded.fastq (always empty)
Sample.unassembled.forward.fastq
Sample.unassembled.reverse.fastq

So those are basically 1 single end and two paired end files, correct? Should I work with all three of them or can I discard the unassembled files?

The megahit syntax is the following:

megahit -1 a1.fq,b1.fq,c1.fq -2 a2.fq,b2.fq,c2.fq -r se1.fq,se2.fq -o out # 3 paired-end libraries + 2 SE libraries

So in my case:

megahit -r Sample.assembled.fastq -o out

Or should I include all 3 files?

megahit -1 Sample.unassembled.forward.fastq -2 Sample.unassembled.reverse.fastq -r Sample.assembled.fastq -o out

I would highly appreciate it if someone could shed some light on it. Thank you very much.

sequencing Assembly next-gen • 1.8k views

ADD COMMENT • link 4.2 years ago by mattze731 ▴ 20

1

Entering edit mode

Where did you read it? In my experience using the paired end or joining the reads doesn't matter.

(MiSeq amplicon sequencing)

Amplicon? Of what? Why would amplicon sequence assemble? 3. How many reads do you have? Soil assembly is very very complex and requires billions of reads and a lot of resources.

ADD REPLY • link 4.2 years ago by Asaf 10k

0

Entering edit mode

It's 16S/18S/ITS amplicons with >10.000 reads.

I follow a workflow chart my supervisor gave me and I kept wondering about the assembly and how it would make sense. So I guess for microbial diversity profiling I don't need the assembly?!

What I did so far was to cut the adapters and merge the forward and reverse reads using PEAR. Can I already start clustering genes to OTUs with that?

ADD REPLY • link 4.2 years ago by mattze731 ▴ 20

2

Entering edit mode

absolutely. I recommend/used vsearch and swarm for clustering. You can compare your workflow with Torbjørn Rognes example workflow

ADD REPLY • link 4.2 years ago by Carambakaracho ★ 3.2k

1

Entering edit mode

Although I very much agree that this is the answer to the original question I want to add something just for the OP to continue his/hers research. Call it metabarcoding instead of metagenomics (Like the name you don't have genomes)

Also take a look at the unoise algorithm (implemented in VSEARCH and UNSEARCH) and DADA2. Also making use of an OTU table can make things easier.

ADD REPLY • link 4.2 years ago by gb ★ 2.2k

0

Entering edit mode

I second DADA2, plus it takes care of paired-end so no need for PEAR

ADD REPLY • link 4.2 years ago by Asaf 10k

0

Entering edit mode

DADA also has a merging step, so if you use the one from DADA or use PEAR it does not matter I think.

ADD REPLY • link 4.2 years ago by gb ★ 2.2k