Question: Import All Protein Sequences In A Chromosome
0
gravatar for User 3206
7.7 years ago by
User 32060
User 32060 wrote:

Hi guys I have managed to write up the code below in python that accesses a file with protein ids and import their sequences from genbank. I now wanted to write one that would import all the protein sequences in a given chromosome since I don't have all their ids. i.e number. to import the entire protein sequences given the chromosome number.

Any suggestions would be appreciated!

Thank you

<h6>#</h6>

from numpy import * z=genfromtxt('C:\Users\Mohammed\Desktop\ProteinIDs.txt', dtype='S12', delimiter=',', usecols=[0],unpack=True) exit

for i in range (500):

prot= '"%s"' %((z)[i])

print prot

from Bio import Entrez , SeqIO

Entrez.email = 'me@uga.edu'

handle = Entrez.efetch(db="protein", id="prot", rettype="fasta",retmode="text")

record = SeqIO.read(handle,"fasta")

String=str(record)

f= open('C:\Users\Mohammed\Desktop\protein_seqs\%s.txt' % (z)[i], 'w') for i in range (1): SeqIO.write(record, f, "fasta") print record f.close()

biopython • 1.7k views
ADD COMMENTlink modified 7.7 years ago by Alex1.5k • written 7.7 years ago by User 32060

Can you indent your code? Put 4 spaces in front of it (this will make it a code block), and make sure that the for loops are indented correctly. This makes it easier for us to evaluate your code.

ADD REPLYlink written 7.7 years ago by Niek De Klein2.5k
2
gravatar for Alex
7.7 years ago by
Alex1.5k
Theodosius Dobzhansky Center for Genome Bioinformatics
Alex1.5k wrote:

Take a look at the following code that sorts proteins by chromosome, the list of all proteins you can get for example from NCBI's ftp site.

from Bio import Entrez , SeqIO

protein_ids = [61677879, 61677880, 61677881, 60637879, 60637579]

Entrez.email = 'me@uga.edu'

chr_to_proteins = {"unknown":[]}

for pid in protein_ids:

    handle = Entrez.efetch(db="protein", id=pid, rettype="gb", retmode="text")
    record = SeqIO.read(handle, "gb")
    feature_found = False
    for feature in record.features:
        if hasattr(feature, "qualifiers"):
            if "chromosome" in feature.qualifiers:
                for chr in feature.qualifiers["chromosome"]:
                    if not feature_found:
                        chr_to_proteins.setdefault(chr, [])
                        chr_to_proteins[chr].append(record)
                        feature_found = True

    if not feature_found:
        chr_to_proteins["unknown"].append(record)

for chr, proteins in chr_to_proteins.items():
    print chr, len(proteins)
ADD COMMENTlink modified 7.7 years ago • written 7.7 years ago by Alex1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 838 users visited in the last hour