I have a genbank file and a list of addresses. I'd like to pull the exact nucleotide that corresponds to that address in biopython. How would I do that?
1
0
Entering edit mode
8.5 years ago
Tom ▴ 20

I can't seem to figure out how to do this specific task. So I have a .gbk assembly file, and I have a simple excel sheet full of about 1000 numbers. The numbers correspond to SNP sites I can find in the genbank, but what I really want in the end is a list of what those nucleotides are in the assembly. What commands do I use to view said nucleotides given a list of addresses?

If biopython does not have this option what does?

CLCBio BioPython • 2.0k views
ADD COMMENT
1
Entering edit mode

Could be informative :

  1. Bioconductor : Load local genbank file
  2. BioPython : Converting GenBank files to FASTA format with Biopython
ADD REPLY
1
Entering edit mode
8.5 years ago
Peter 6.0k

Something like this:

from Bio import SeqIO
my_snp_list = [100, 1234]  # Python counting
record = SeqIO.read("single_contig.gbk")  # Assumes single contig!
for snp in my_snp_list:
    print("Position %i is nucleotide %s" % (snp, record.seq[snp]))

This is using Python's slice notation to pull out a single base from the sequence. Note you may need to convert your SNP coordinates to Python style zero-based counting by subtracting one.

ADD COMMENT

Login before adding your answer.

Traffic: 1233 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6