Question: Extracting Specific Regions In Genbank File
0
gravatar for pixie@bioinfo
7.9 years ago by
pixie@bioinfo1.4k
Université Paris, Saclay
pixie@bioinfo1.4k wrote:

I have the following file format: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord#FeaturesA

and I want to extract the information contained under the CDS regions (there can be many CDS regions and they need not be consecutive)

CDS               <1..206
                 /codon_start=3
                 /product="TCP1-beta"
                 /protein_id="AAA98665.1"
                 /db_xref="GI:1293614"
                 /translation="SSIYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASEA
                 AEVLLRVDNIIRARPRTANRQHM"
 gene            687..3158
                 /gene="AXL2"
 CDS             687..3158
                 /gene="AXL2"
                 /note="plasma membrane glycoprotein"
                 /codon_start=1
                 /function="required for axial budding pattern of S.
                 cerevisiae"
                 /product="Axl2p"
                 /protein_id="AAA98666.1"
                 /db_xref="GI:1293615"

I have already read the file and have kept each line in a list

l_cds=[]   #list to hold the info under cds
for index in range(len(list_eachline)):
    if list_eachline[index].startswith("CDS"):
        l_cds.append(list_eachline[index])

this is extracting only the first line of the CDS region..how do modify the code ?...thanks (I am not supposed to use BioPython)

python • 3.8k views
ADD COMMENTlink modified 7.9 years ago by a.zielezinski9.3k • written 7.9 years ago by pixie@bioinfo1.4k

Can we assume this to be a homework question, since you are "not supposed to use BioPython"?

ADD REPLYlink written 7.9 years ago by Neilfws49k

Its just a part of a project..and it was suggested not to use BioPython because it would become too simple. I have coded for a similar thing in Perl. And its not that I am asking for a readymade code..I have checked through the StackOverflow as well and everywhre they talk about BioPython. Hence I asked it here !

ADD REPLYlink written 7.9 years ago by pixie@bioinfo1.4k
5
gravatar for a.zielezinski
7.9 years ago by
a.zielezinski9.3k
a.zielezinski9.3k wrote:

Modify your code a little bit:

l_cds=[]   #list to hold the info under cds
isCDS = False
for index in range(len(list_eachline)):
    if list_eachline[index].startswith("CDS"):
        isCDS = True
    elif not list_eachline[index].startswith(" "):
        isCDS = False
    if isCDS:
        l_cds.append(list_eachline[index])
ADD COMMENTlink written 7.9 years ago by a.zielezinski9.3k

Thanks so much !

ADD REPLYlink written 7.9 years ago by pixie@bioinfo1.4k
1
gravatar for Martin A Hansen
7.9 years ago by
Martin A Hansen3.0k
Denmark
Martin A Hansen3.0k wrote:

If your Genbank files are straight forward (i.e. not linking to other Genbank entries) you can use Biopieces www.biopieces.org) to create a table:

read_genbank -i test.gb -k AC -f CDS | write_tab -k ACCESSION,S_BEG,S_END,STRAND,SEQ,TRANSLATION -x
ADD COMMENTlink written 7.9 years ago by Martin A Hansen3.0k

thank you for the suggestion...but I am just curious to know if there is a way to do it with just Python

ADD REPLYlink written 7.9 years ago by pixie@bioinfo1.4k

I am sure Biopython can do it. Read the docs :o). If you want to roll your own GenBank parser: good luck.

ADD REPLYlink modified 7.9 years ago • written 7.9 years ago by Martin A Hansen3.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2329 users visited in the last hour