Question: blastall vs bl2seq, and blastall fasta output
0
gravatar for thomas.welch
2.3 years ago by
thomas.welch0 wrote:

Hi there, I hope one of you guys can help me with this quite basic question.

First of all i am trying to pick out homologs for phylogenetic analysis using standalone blast in the unix terminal. i am using a single gene in fasta format as query against a downloaded genome formatted with formatdb. I get the output i want with this but is there a way i can get my highest scoring hit in a fasta file output?

Secondly when i conduct the same search using the bl2seq command i get very different outputs, with hits much smaller in length (and clearly noise), and when i apply the same e-value constraints as i use with my blastall search, no hits at all.

blast+ blast bl2seq blastall fasta • 806 views
ADD COMMENTlink written 2.3 years ago by thomas.welch0
1

You can use the blastdbcmd utility to retrieve fasta formatted sequences from the hits you are interested in: NCBI Blast locally: filter by accession number and NOT by GI number You may have to go to a tabular output format/parse out accession numbers you need.

Blast results depend heavily on the size of the database being searched against. A regular blast search and blasting two sequences against each other are significantly different.

ADD REPLYlink written 2.3 years ago by genomax62k

thank you for your help. however unfortunately this command does not give me the hit. the genome i have called formatdb on is a fasta file of shotgun sequence runs for a whole genome. while blastdbcmd gives me a fasta file of the accession which contains the hit, it does not give me the hit itself.

ADD REPLYlink written 2.3 years ago by thomas.welch0
1

Since you are now providing this additional information the solution will change. You can convert the blast "hit" coordinates into BED format (chr, start, top) and then use bedtools getfasta to retrieve the sequences you need.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by genomax62k

thank you, but i don't think this will work either. the file for the organism i am investigating does not have chromosome coordinates (although there are of course query start and stop coordinates). it looks like i will have simply make a script to extract the correct (query) lines from the blastall output file, and then stick them together.

ADD REPLYlink written 2.3 years ago by thomas.welch0
1
`chr = whatever_name_you_have_for_subject` in this case
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by genomax62k

got it. this works perfectly. thank you very much.

ADD REPLYlink written 2.3 years ago by thomas.welch0

Any particular reason why you are running blastall over blast+? Blastall is the legacy version of blast

ADD REPLYlink written 2.3 years ago by Jenez510
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2134 users visited in the last hour