How To Retrieve Mrna Split Locations From Genbank Flatfile?
1
0
Entering edit mode
9.0 years ago
mluypaert ▴ 10

Hi all,

I got some trouble parsing the genbank flatfile format that ncbi is using for data export. I got a genbank flatfile containing genomic regions with mRNA features in it, which I am parsing with perl (and Bioperl). The mRNA features were retrieve with the get_SeqFeatures() function and I can retrieve all information about each mRNA using the get_all_tags() and the get_tag_values() functions from Bioperl, but I also need the genomic locations for each exon in the mRNA. For that I need to find the genomic location of the gene it belongs to (which don't seem to be in the flatfiles I downloaded) but more importantly, I need to be able to get the split locations for each exon in the mRNA from the mRNA line like:

 mRNA            complement(join(4468..4717,4801..4940,6511..6767,
6933..7071,9260..9344,9478..9593))


How can I retrieve this bit of information (from the SeqFeature object I am using in BioPerl)?

genbank ncbi parsing perl bioperl • 2.3k views
0
Entering edit mode
9.0 years ago
mluypaert ▴ 10

I found the answer myself after some browsing in the Bioperl manuals. The following chunck of perl code solved my problem:

        $location_obj =$feat_object->location();

# retrieve split location

my $location_ref = ref($location_obj);
if($location_ref eq 'Bio::Location::Simple'){$sub_locations[0] = $location_obj; }elsif($location_ref eq 'Bio::Location::Split'){
@sub_locations = \$location_obj->sub_Location();
}


I made a Genomic Region For Ncbi Transcript(/Gene) Accessions for retrieving the genomic location instead of the contig locations (which are retrieved directly from the genbank flatfiles in this case).