Extracting information from a genbank file
2
0
Entering edit mode
4.1 years ago
l.souza ▴ 70

Hello,

How can I export some features (like country and date) from sequences inside a genbank file?

Thank you in advance!

genbank file sequence features • 3.4k views
ADD COMMENT
2
Entering edit mode

Did you go through this biostar question?

ADD REPLY
0
Entering edit mode

you can use this python program to extract different fields from gene bank file https://github.com/dewshr/NCBI-Genbank-file-parser

ADD REPLY
1
Entering edit mode
4.1 years ago
l.souza ▴ 70

I could get it using the python script in this question

ADD COMMENT
0
Entering edit mode

Please upvote the script provider in that question

ADD REPLY
1
Entering edit mode
4.1 years ago

The " LOCUS" tag typically contains the date at the end like mentioned below:

LOCUS       NM_001204686             968 bp    mRNA    linear   INV 28-MAY-2017

This could be extracted like this:

grep "LOCUS" <your_genbank_file.txt> | awk '{print $NF}'

I am not sure what do you mean by "country" information. Please elaborate or provide example.

ADD COMMENT
0
Entering edit mode

I mean from where the sequences come. Like in this exemple:

LOCUS       KT968663                8103 bp    RNA     linear   VRL 29-MAR-2016 
DEFINITION  Foot-and-mouth disease virus - type A isolate A/HY/CHA/2013,
        complete genome. 
ACCESSION   KT968663 
VERSION     KT968663.1 
KEYWORDS    . SOURCE      Foot-and-mouth disease virus - type A (FMDV-A)   
  ORGANISM  Foot-and-mouth disease virus - type A
            Viruses; ssRNA viruses; ssRNA positive-strand viruses, no DNA
            stage; Picornavirales; Picornaviridae; Aphthovirus. 
REFERENCE   1  (bases 1 to 8103)   
  AUTHORS   Yang,X., Yang,J., Wang,H. and Zeng,F.   
  TITLE     Direct Submission   
  JOURNAL   Submitted (30-OCT-2015) College of Life Science, Sichuan
            University, Wangjiang Road 29#, Chengdu, Sichuan 610064, China 
COMMENT     ##Assembly-Data-START##
            Sequencing Technology :: Sanger dideoxy sequencing
            ##Assembly-Data-END## 
FEATURES             Location/Qualifiers
 source          1..8103
                 /organism="Foot-and-mouth disease virus - type A"
                 /mol_type="genomic RNA"
                 /serotype="A"
                 /isolate="A/HY/CHA/2013"
                 /host="Bos grunniens"
                 /db_xref="taxon:12111"
                 **/country="China"**
                 /collection_date="15-Aug-2013"
                 /note="subtype: Sea97"
ADD REPLY
1
Entering edit mode

in a shell file, say "run.sh"

grep "^LOCUS" gb.txt | awk '{print $NF}'
grep "country" gb.txt | sed 's/ //g' | cut -d "=" -f2 | sed 's/"//g'
  • gb.txt = your genbank file

run as

$sh run.sh

output

[user@desktop]$ sh run.sh
29-MAR-2016
China
ADD REPLY

Login before adding your answer.

Traffic: 1839 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6