GI chromosome list ncbi
1
0
Entering edit mode
8.7 years ago
TEman ▴ 10

Hi,

I am using Entrez.efetch in BioPython to fetch sequences from genomic coordinates (chr, start, stop, strand):

handle = Entrez.efetch(db="nucleotide",
                     id=[GI number],
                     rettype="fasta",
                     strand=strand,
                     seq_start=start,
                     seq_stop=stop)
record = SeqIO.read(handle,"fasta")
handle.close()
print record.seq

The package requires the GI of the chromosomes, but I don't know how to find these. Tried to manually search in the nucleotide database in ncbi, but this gives a lot of different GIs for each chromosome.

I am looking for the most recent mus musculus genomic sequences.

Any suggestions how to get these GIs or another way to circumvent this problem?

Best

Per

genome sequence • 2.7k views
ADD COMMENT
1
Entering edit mode
8.7 years ago
GenoMax 141k

One way to do this is to grab the RefSeq ID's for the chromosomes from the genome page: http://www.ncbi.nlm.nih.gov/genome/52

NC_000067.6
NC_000068.7
NC_000069.6
NC_000070.6
NC_000071.6
NC_000072.6
NC_000073.6
NC_000074.6
NC_000075.6
NC_000076.6
NC_000077.6
NC_000078.6
NC_000079.6
NC_000080.6
NC_000081.6
NC_000082.6
NC_000083.6
NC_000084.6
NC_000085.6
NC_000086.7
NC_000087.7
NC_005089.1

Then use the blastdbcmd utility along with the blast indexes for refseq_genomic blast indexes

$ blastdbcmd -entry_batch file_with_refseq_ID -db /path_to/refseq_genomic -outfmt "%g"

to get the GI #

372099109
372099108
372099107
372099106
372099105
372099104
372099103
372099102
372099101
372099100
372099099
372099098
372099097
372099096
372099095
372099094
372099093
372099092
372099091
372099090
372099089
34538597
ADD COMMENT

Login before adding your answer.

Traffic: 2846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6