Select human protein coding transcripts in Diamond
13 months ago
bart ▴ 20

Hi,

I'm trying to select short DNA reads that align to human protein coding transcripts in the diamond tool. My problem is that Diamond normally does not select human reads. So I want to build with the diamond makedb tool. However, I'm not sure what FASTA file I would need in the --in <file> option: it needs a protein reference database, so would this be the NCBI nr database?

13 months ago
GenoMax 123k

You should consider getting MANE select (LINK) proteins. See the project description and then download the faa protein sequence file from NCBI FTP site. This will contain one entry per gene.

Second option would be to download the curated human proteome files from UniProt (LINK). This set will be redundant and will contain isoforms etc.

so would this be the NCBI nr database?

That can be a third option. You could get the nr indexes (latest DIAMOND can now use blast indexes) and do the search. It may also support filtering based on taxID (which would be 9606 for human). BBuchfink was considering that request from someone.