I am very interested in using the software such as Diamond and MEGAN to process the metagenomics data.
The data is not 16S RNA but all genes.
I wonder if you could help me with the following three questions:
1) I have paired-end data, is it a good practice to merge them together before running the blastx via Diamond or run Diamond for each end read separately?
2) If running diamond on them separately, what is the best way to feed the data into daa2rma or daa-meganizer? I notice that there are options for pair-end data for those two tools but am not sure if I understand the usage of them, in particular, for the parameter "-ps (--pairedSuffixLength)".
For R1 file:
diamond blastx --db nr --query sample1_R1.fq --threads 24 --outfmt 100 --out sample1.R1.daa
For R2 file:
diamond blastx --db nr --query sample1_R2.fq --threads 24 --outfmt 100 --out sample1.R2.daa
Now we have got two DAA files such as sample1.R1.daa and sample1.R2.daa. What is the appropriate way to provide them to daa2rma or daa-meganizer?
I have tired to convert those two DAA files into BLAST tabular format, and it occurs to me that the resultant tabular format files will not distinguish the read 1 and read 2 from the same pair and the two end reads will be given the exactly the same name in those tabular files. In turn, I speculate that the DAA files will not distinguish the two reads from the same pair in terms of the read name.
3) Would you recommend using whether daa2rma or daa-meganizer to process the data before loading the data into MEGAN?