Question

GI chromosome list ncbi

0

Entering edit mode

8.7 years ago

TEman ▴ 10

Hi,

I am using Entrez.efetch in BioPython to fetch sequences from genomic coordinates (chr, start, stop, strand):

handle = Entrez.efetch(db="nucleotide",
                     id=[GI number],
                     rettype="fasta",
                     strand=strand,
                     seq_start=start,
                     seq_stop=stop)
record = SeqIO.read(handle,"fasta")
handle.close()
print record.seq

The package requires the GI of the chromosomes, but I don't know how to find these. Tried to manually search in the nucleotide database in ncbi, but this gives a lot of different GIs for each chromosome.

I am looking for the most recent mus musculus genomic sequences.

Any suggestions how to get these GIs or another way to circumvent this problem?

Best

Per

genome sequence • 2.7k views

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.7 years ago by TEman ▴ 10

Ram · Answer 1 · 2015-08-21

One way to do this is to grab the RefSeq ID's for the chromosomes from the genome page: http://www.ncbi.nlm.nih.gov/genome/52

NC_000067.6
NC_000068.7
NC_000069.6
NC_000070.6
NC_000071.6
NC_000072.6
NC_000073.6
NC_000074.6
NC_000075.6
NC_000076.6
NC_000077.6
NC_000078.6
NC_000079.6
NC_000080.6
NC_000081.6
NC_000082.6
NC_000083.6
NC_000084.6
NC_000085.6
NC_000086.7
NC_000087.7
NC_005089.1

Then use the blastdbcmd utility along with the blast indexes for refseq_genomic blast indexes

$ blastdbcmd -entry_batch file_with_refseq_ID -db /path_to/refseq_genomic -outfmt "%g"

to get the GI #