Question

retrieve plasmid sequences from NCBI

1

Entering edit mode

4.2 years ago

t4192 ▴ 20

Nowadays I have sequenced a genome of a bacterial plasmid containing 45kb. I want to use NCBI BLAST to get other similar plasmids' complete sequences which are 40~60kb.

My requirements are

(1) they should be pathogenic bacterial plasmid such as Klebsiella pneumoniae and Escherichia coli

(2) they should have bla-NDM gene which means they can be resistant for antibiotics

(3) they should have the coverage of alignment with my 45kb plasmid at least 90%

(4) I want to get their specific taxonomy information

(5) I want to download their complete circular sequences in gbk file format

Please tell me how I can do such NCBI BLAST retrieving. I also have skill of Linux command line operating. I can write command or do the task on the NCBI website.

Thank you for your help!

genome • 1.1k views

ADD COMMENT • link updated 4.2 years ago by Cornel ▴ 50 • written 4.2 years ago by t4192 ▴ 20

0

Entering edit mode

they should have the coverage of alignment with my 45kb plasmid at least 90%

That condition can't be tested prior to/during sequence retrieval.

You have two options.

It may be just simpler to use the sequence you have and blast at NCBI. You can look into limiting searches to the two species you are interested in.
Download the sequences locally. Keep in mind that plasmid sequences may in included in larger whole genome sequence files. If you choose to go this route then use Kai Blin's ncbi-genome-download tool.

ADD REPLY • link 4.2 years ago by GenoMax 141k

0

Entering edit mode

I haven't checked this database but maybe you can download and filter it and use it as blastdb. https://datadryad.org/stash/dataset/doi:10.15146/R33X2J

ADD REPLY • link 4.2 years ago by Fatima ▴ 1000

score 1 · Answer 1 · 2020-02-19

Get the data from NCBI: perhaps something like this:

https://www.ncbi.nlm.nih.gov/nuccore/?term=bla-NDM+and+(%22Klebsiella+pneumoniae%22%5BOrganism%5D+or+%22Escherichia+coli%22%5BOrganism%5D)+and+genome

Get all the Accession numbers, retrieve all the data for them in GB format, extract the sequences in fasta format, build a blast DB, use your genome to blast against it and filter by 90% alignment coverage.

Perhaps this set of tools will help you: https://github.com/cghiban/custom-blast-db