Hi everyone,
I'm working on a study about soil metagenomics and needless to say I'm relatively new to all of this. I read from multiple sources that it is recommendable to merge the paired end files (MiSeq amplicon sequencing) with PEAR before assembling them with megahit.
I successfully merged the files with PEAR and go the following output:
- Sample.assembled.fastq
- Sample.discarded.fastq (always empty)
- Sample.unassembled.forward.fastq
- Sample.unassembled.reverse.fastq
So those are basically 1 single end and two paired end files, correct? Should I work with all three of them or can I discard the unassembled files?
The megahit syntax is the following:
megahit -1 a1.fq,b1.fq,c1.fq -2 a2.fq,b2.fq,c2.fq -r se1.fq,se2.fq -o out # 3 paired-end libraries + 2 SE libraries
So in my case:
megahit -r Sample.assembled.fastq -o out
Or should I include all 3 files?
megahit -1 Sample.unassembled.forward.fastq -2 Sample.unassembled.reverse.fastq -r Sample.assembled.fastq -o out
I would highly appreciate it if someone could shed some light on it. Thank you very much.
Amplicon? Of what? Why would amplicon sequence assemble? 3. How many reads do you have? Soil assembly is very very complex and requires billions of reads and a lot of resources.
It's 16S/18S/ITS amplicons with >10.000 reads.
I follow a workflow chart my supervisor gave me and I kept wondering about the assembly and how it would make sense. So I guess for microbial diversity profiling I don't need the assembly?!
What I did so far was to cut the adapters and merge the forward and reverse reads using PEAR. Can I already start clustering genes to OTUs with that?
absolutely. I recommend/used vsearch and swarm for clustering. You can compare your workflow with Torbjørn Rognes example workflow
Although I very much agree that this is the answer to the original question I want to add something just for the OP to continue his/hers research. Call it metabarcoding instead of metagenomics (Like the name you don't have genomes)
Also take a look at the unoise algorithm (implemented in VSEARCH and UNSEARCH) and DADA2. Also making use of an OTU table can make things easier.
I second DADA2, plus it takes care of paired-end so no need for PEAR
DADA also has a merging step, so if you use the one from DADA or use PEAR it does not matter I think.