Question: Predicted Proteins: Mapping Reads to ORFs
12 months ago
Longshotx
Longshotx wrote:

Hi All - I have some soil metagenomes from a number of samples. I assembled the metagenomes and predicted ORFs using Prodigal, then I used HMMSCAN and some additional tools to scan the ORFs against a custom protein database and look for antibiotic-like ORFs. I ran a blastp search against the nr database using diamond and annotated these antibiotic like ORFs using Megan 6 to get the bacterial taxonomies.

Question - I would like to determine which bacterial hosts contain these antibiotic like ORFs, and determine the number of sequencing reads that map to the antibiotic ORFs (like a coverage matrix for all the samples and predicted proteins). I thought I could map the original sequencing reads to the annotated antibiotic ORFs using bowtie2 but realized it is not equipped to map against protein sequences.

I'm looking for suggestions for the best approach to my particular study. Thanks for your input!

modified 12 months ago
12 months ago
Hannover Medical School
colindaven wrote:

There are amino acid to DNA mappers out there (eg protein2genome + one or two others) .

However, why not just fish your ORFs which are DNA originally (did you keep the same headers ?) which have BLAST hits and use these as reference sequences? You could also re-blast using blastx vs nr instead of blastp.

You'll have fun with antibiotic protein families which are mis and coassembled though.

written 12 months ago by colindaven

Thank you! That was very helpful.

written 11 months ago by Longshotx
