Question: How To Get Sequence Around Blast Result?
3
gravatar for dustar1986
7.7 years ago by
dustar1986330
USA
dustar1986330 wrote:

Hi,

I blasted a serires of inquiry sequences against the a pre-built blast-database file using blastall and the xml result was parsed by biopython.

I got the genomic coordinates information of the inquiry sequences.

Now I also want to know the 500bp sequences upstream and downstream of each inquiry sequences.

I know biopython can achieve this by extracting sequences from fasta file of each chromosome or from online database.

But I don't want to do this because I don't have enough spaces to keep fasta file of each chromosome locally. And online fetching is too slow when the inquiry file is large.

Is that possible to fetch such sequences using genomic coordinates from the pre-built blast-database file (So I can just keep this database file on the disk for each species)?

biopython • 5.9k views
ADD COMMENTlink modified 7.7 years ago by Leszek4.1k • written 7.7 years ago by dustar1986330
16
gravatar for a.zielezinski
7.7 years ago by
a.zielezinski9.2k
a.zielezinski9.2k wrote:

Of course, it's very easy using BLAST+, so install it right away!

First, make sure you use the -parse_seqids parameter while creating blast database:

makeblastdb -in tair10.fa -dbtype prot -parse_seqids

Then, use blastdbcmd to fetch sequences with a specific range.

blastdbcmd -db tair10.fa -dbtype prot -entry AT1G50920.1 -range 1-10

The output is.

>AT1G50920.1:1-10 | Symbols:  | Nucleolar GTP-binding protein | chr1:18870555-18872570 FORWARD LENGTH=671
MVQYNFKRIT
ADD COMMENTlink modified 7.7 years ago • written 7.7 years ago by a.zielezinski9.2k
1

Thanks Zielezinski. This is really helpful.

ADD REPLYlink written 7.7 years ago by dustar1986330

I'm glad I could help!

ADD REPLYlink written 7.7 years ago by a.zielezinski9.2k

What if we have a query with 100 sequences? Manual parsing may not be feasible.

ADD REPLYlink written 14 months ago by adhirajnath1430
1

If you have a query with multiple accession numbers, you provide them in a text file (e.g., query.txt):

AT1G50920.1
AT1G50930.1
AT1G50940.1
AT1G50950.1

and you run the command:

blastdbcmd -db tair10.fa -dbtype prot -entry_batch query.txt

You will get FASTA sequences for the accession numbers from the query.txt file. However, when you use -batch_entry argument, you can't use the -range argument.

ADD REPLYlink written 14 months ago by a.zielezinski9.2k

Thank you so much. What if we want to parse 50 nt flanks with the aligned sequence?

ADD REPLYlink written 14 months ago by adhirajnath1430
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1005 users visited in the last hour