How would you create a table for multiple organisms vs presence of multiple genes using command line blast? [image in description]
2
0
Entering edit mode
8.1 years ago
Tom ▴ 20

Here's the situation.

I have proteome files for a bunch of strains. Each strain has its own fasta proteome (strain1.faa, strain2.faa, strain3.faa).

I also have a fasta list of AA sequences, and I want to know if they are present within these strains. That "query" file, looks like this:

>gene 1

MKGMF...*

>gene 2

MQWAEA...*

etc...

What I want in the end is a matrix with the strains in first column, and first row being the genes. I DONT want to have to do a manual blast for every cell because that's impractical. I just want the information. The values in the matrix is the %identity of that gene in that strain. It will look like this: enter image here What is the most parsimonious way to go about this project? I have a lot of strains, and hundreds of genes to test. But, I'm okay with outputing a csv for now. It's such a large task that I'm unsure of how to start it.

blast command line blastp • 2.2k views
ADD COMMENT
0
Entering edit mode
8.1 years ago
5heikki 11k

When you have the cvs load it into R (maybe RStudio) and plot if with ggplot2 like here. One very fast way to get a distance matrix is to use the cool new mash algorithm. I think it should work with proteomes too..

p.s. I don't really understand your picture. How is strain X Y percent some gene?

ADD COMMENT
0
Entering edit mode

It's not the heatmap I want. It's just the raw information. I don't want to have to individually do a blast search manually for each cell.

I can't find another google image picture that depicts this very type of project.

ADD REPLY
0
Entering edit mode
8.1 years ago
Michael 54k

You don't need more than one blast run to do this. Put all the reference sequences or genomes on the y-axis into one blast database. Put all query sequence on the x-axis into the query fasta. Run the right blast command (e.g. tblastn, or blastp), and you are done.

ADD COMMENT

Login before adding your answer.

Traffic: 2369 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6