Megahit assembly after merging files with PEAR
0
0
Entering edit mode
23 months ago
mattze731 • 0

Hi everyone,

I'm working on a study about soil metagenomics and needless to say I'm relatively new to all of this. I read from multiple sources that it is recommendable to merge the paired end files (MiSeq amplicon sequencing) with PEAR before assembling them with megahit.

I successfully merged the files with PEAR and go the following output:

  • Sample.assembled.fastq
  • Sample.discarded.fastq (always empty)
  • Sample.unassembled.forward.fastq
  • Sample.unassembled.reverse.fastq

So those are basically 1 single end and two paired end files, correct? Should I work with all three of them or can I discard the unassembled files?

The megahit syntax is the following:

megahit -1 a1.fq,b1.fq,c1.fq -2 a2.fq,b2.fq,c2.fq -r se1.fq,se2.fq -o out # 3 paired-end libraries + 2 SE libraries

So in my case:

megahit -r Sample.assembled.fastq -o out

Or should I include all 3 files?

megahit -1 Sample.unassembled.forward.fastq -2 Sample.unassembled.reverse.fastq -r Sample.assembled.fastq -o out

I would highly appreciate it if someone could shed some light on it. Thank you very much.

sequencing Assembly next-gen • 946 views
ADD COMMENT
1
Entering edit mode
  1. Where did you read it? In my experience using the paired end or joining the reads doesn't matter.
  2. 2.

(MiSeq amplicon sequencing)

Amplicon? Of what? Why would amplicon sequence assemble? 3. How many reads do you have? Soil assembly is very very complex and requires billions of reads and a lot of resources.

ADD REPLY
0
Entering edit mode

It's 16S/18S/ITS amplicons with >10.000 reads.

I follow a workflow chart my supervisor gave me and I kept wondering about the assembly and how it would make sense. So I guess for microbial diversity profiling I don't need the assembly?!

What I did so far was to cut the adapters and merge the forward and reverse reads using PEAR. Can I already start clustering genes to OTUs with that?

ADD REPLY
2
Entering edit mode

absolutely. I recommend/used vsearch and swarm for clustering. You can compare your workflow with Torbjørn Rognes example workflow

ADD REPLY
1
Entering edit mode

Although I very much agree that this is the answer to the original question I want to add something just for the OP to continue his/hers research. Call it metabarcoding instead of metagenomics (Like the name you don't have genomes)

Also take a look at the unoise algorithm (implemented in VSEARCH and UNSEARCH) and DADA2. Also making use of an OTU table can make things easier.

ADD REPLY
0
Entering edit mode

I second DADA2, plus it takes care of paired-end so no need for PEAR

ADD REPLY
0
Entering edit mode

DADA also has a merging step, so if you use the one from DADA or use PEAR it does not matter I think.

ADD REPLY

Login before adding your answer.

Traffic: 1510 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6