Hello community! I need to perform a protein family profiling. My professor recommended me to use mmseqs2 for this purpose but I don't have clear how to do it.
In brief I have several environmental metagenomic samples each one represented by paired end reads sequencing data.
I need to study the gene content and abundance of each sample, so I created a protein database from uniprotkb entries and this is what I have in mind:
mmseqs createdb my_protein_target_db.fasta target_db
mmseqs createindex targetdb tmp
The issue is that I don't know how to perform the profiling in such a way that the metagenomic samples are processed in a "paired end conscious way".
Should I create a database for each query metagenomic samples like this?
cat sample1_reads_1.fastq sample1_reads_2.fastq > reads_sample1.fastq
mmseqs createdb reads_sample1.fastq query_reads_sample1
And after that perform the alignment with mmseqs search
per each sample reads set ?