Question: retrieve plasmid sequences from NCBI
1
gravatar for t4192
3 months ago by
t419220
t419220 wrote:

Nowadays I have sequenced a genome of a bacterial plasmid containing 45kb. I want to use NCBI BLAST to get other similar plasmids' complete sequences which are 40~60kb.

My requirements are

(1) they should be pathogenic bacterial plasmid such as Klebsiella pneumoniae and Escherichia coli

(2) they should have bla-NDM gene which means they can be resistant for antibiotics

(3) they should have the coverage of alignment with my 45kb plasmid at least 90%

(4) I want to get their specific taxonomy information

(5) I want to download their complete circular sequences in gbk file format

Please tell me how I can do such NCBI BLAST retrieving. I also have skill of Linux command line operating. I can write command or do the task on the NCBI website.

Thank you for your help!

genome • 158 views
ADD COMMENTlink modified 3 months ago by Cornel50 • written 3 months ago by t419220

they should have the coverage of alignment with my 45kb plasmid at least 90%

That condition can't be tested prior to/during sequence retrieval.

You have two options.

  1. It may be just simpler to use the sequence you have and blast at NCBI. You can look into limiting searches to the two species you are interested in.
  2. Download the sequences locally. Keep in mind that plasmid sequences may in included in larger whole genome sequence files. If you choose to go this route then use Kai Blin's ncbi-genome-download tool.
ADD REPLYlink modified 3 months ago • written 3 months ago by genomax84k

I haven't checked this database but maybe you can download and filter it and use it as blastdb. https://datadryad.org/stash/dataset/doi:10.15146/R33X2J

ADD REPLYlink written 3 months ago by Fatima590
1
gravatar for Cornel
3 months ago by
Cornel50
Cornel50 wrote:

Get the data from NCBI: perhaps something like this:

https://www.ncbi.nlm.nih.gov/nuccore/?term=bla-NDM+and+(%22Klebsiella+pneumoniae%22%5BOrganism%5D+or+%22Escherichia+coli%22%5BOrganism%5D)+and+genome

Get all the Accession numbers, retrieve all the data for them in GB format, extract the sequences in fasta format, build a blast DB, use your genome to blast against it and filter by 90% alignment coverage.

Perhaps this set of tools will help you: https://github.com/cghiban/custom-blast-db

ADD COMMENTlink written 3 months ago by Cornel50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1799 users visited in the last hour