Question: Presence/absence of proteins blastp nr database
0
gravatar for freekwilly
20 months ago by
freekwilly0
freekwilly0 wrote:

Hi all,

I am a molecular biologist in the first place, but came into touch with bioinformatic challenges more often recently. For simple tasks I've been using command line blast+ already and now I am wondering whether it is possible to do the following with blast+:

Let's say I have a random protein sequence in a file named RP1.faa and I want to check each bacterial strain in the refseq nr database on whether RP1 ist present or absent by an e-value threshold of e.g. 1e-05. And I want a table which shows me only the best hit for each strain in the first column and e.g. e-value, bit-score and so on in the following columns. And in each row I want the next strain to be shown, even if no blast hit was found In the end I am imagining a matrix which shows me whether a certain protein is present or absent.

Do you know if it is possible? Thanks a lot!

blast alignment genome • 683 views
ADD COMMENTlink written 20 months ago by freekwilly0
1

You can show the presence of each straing by, doing a tab output with -outfmt 6, and provide the e-value threshold to the blast command, as wells as use -max_target_seqs 1 to show the best alignment for each strain. What you will have to do is, conclude how against many/which strains RP1 is absent. Someone may have a better approach.

ADD REPLYlink modified 20 months ago • written 20 months ago by st.ph.n2.4k

In addition you will have to filter/restrict the results for bacteria (in theory, taxid:2).

ADD REPLYlink written 20 months ago by genomax65k

I am trying to obtain a similar output but using the rpsblast function in the command line. Do you know if there is a way to not only display just the best hit but also only "specific hits" as is given as a result in the web-based version?

ADD REPLYlink written 20 months ago by lauren.manck0

Hey, thanks for the fast answers. But unfortunately it is not yet what I was looking for. When using -max_target_seqs 1 I'll get only one hit in total, not one hit for every entry in my database.

ADD REPLYlink written 20 months ago by freekwilly0

not one hit for every entry in my database

Is the reference to nr or to your own sequence data? You can always flip the search around. It is not necessary that there will be a hit found for every sequence being searched in both of those cases.

ADD REPLYlink modified 20 months ago • written 20 months ago by genomax65k

For now it still refers to the "nr" database. I'm am currently trying to subset it into a more precise and useful database

ADD REPLYlink written 20 months ago by freekwilly0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 727 users visited in the last hour