Extract Intergenic Sequences from GenBank File with Perl
1
0
Entering edit mode
6.9 years ago
Diesel • 0

Hi,

I need help; I need to use perl, possibly Bioperl, to extract the intergenic regions from a genbank file (.gb). The organism is B.subtilis strain 168.

Is there something on Bioperl to do this?

I am competent with R but very rusty with Perl, I would need a bit of advice on how to do this task. I have perl and Bioperl installed.

The genbank file also has the full DNA sequence at the end, after all the information.

perl • 2.5k views
ADD COMMENT
3
Entering edit mode
6.9 years ago

Extract the gene positions using efetch and eutils, generated a bed file. Something like:

 $ curl -s "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NC_018520.1&rettype=gbwithparts&retmode=txt" |\
    grep "^     gene "  | \
    head -n 50 |\
    sed 's/complement(\([^)]*\))/\1/' |\
    cut -c 22- | sed 's/\.\./ /' |\
    awk '{B=int($1);E=int($2); printf("NC_018520\t%d\t%d\n", (B < E?B:E)-1,(B < E?E:B));}'
    NC_018520   409 1750
            NC_018520   1938    3075
            NC_018520   3205    3421
            NC_018520   3436    4549
            NC_018520   4566    4812
            NC_018520   4866    6783
            NC_018520   6993    9459
            NC_018520   9809    11365
            NC_018520   11463   11540
            NC_018520   11551   11627
            NC_018520   11707   14640
            NC_018520   14691   14807
            NC_018520   14846   15794

Use http://bedtools.readthedocs.io/en/latest/content/tools/complement.html to extract the complement intervals and http://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html to extract the intergenic sequences.

ADD COMMENT
0
Entering edit mode

Thanks for your answer Pierre however I tried using this method but was unable to install bedtools as I have not administrator privileges on the machine I am working on; I have tried using other methods, in R, and Perl, each could not fully get me where I want to; finally I have tried downloading and installing GFF-Ex; it did work, however I am not too happy about the output. Thanks for taking the time.

ADD REPLY
0
Entering edit mode

but was unable to install bedtools.

You don't have to be admin. You can install any software in your private directory.

ADD REPLY
0
Entering edit mode

I will try again tomorrow if that is possible; I read the instruction page and at some point it goes:

"At this point, one should copy the binaries in ./bin/ to either usr/local/bin/ or some other repository for commonly used UNIX tools in your environment. You will typically require administrator (e.g. “root” or “sudo”) privileges to copy to usr/local/bin/."

That is why I tried sorting the problem out differently. I am a beginner and I thought that was not possible, having read that statement about copying binaries.

ADD REPLY
1
Entering edit mode

create a bin directory in your home

mkdir -p ${HOME}/bin

copy the binaries in ${HOME}/bin

add ${HOME}/bin to your path:

export PATH=${PATH}:${HOME}/bin
ADD REPLY
0
Entering edit mode

Pierre, Can you update the curl command, the awk expression is cut !!!!

ADD REPLY
0
Entering edit mode

My guess is that the awk expression is:

awk '{B=int($1);E=int($2);printf("NC_018520\t%d\t%d\n",B,E)}'

ADD REPLY

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6