How to extract 3', 5' UTR sequences from genbank records using python, PERL and R code?
2
0
Entering edit mode
4.3 years ago

Hello All, I have 1000 sequences of genebank records, I want to extract the only 3'UTR, 5'UTR sequences from the sequences and to store in excel format. Share your ideas and suggestion [using PERL or Python or R codes]

UTR • 2.0k views
ADD COMMENT
0
Entering edit mode

Hi, please post a sample gbk file and define the headers that you want to see in your output file (Ex: seqID, locusTag, sequence ... ).

ADD REPLY
0
Entering edit mode

Please take a look at the biopython cookbook and tutorial.

ADD REPLY
0
Entering edit mode
4.3 years ago
padwalmk ▴ 140

Hi, It's unclear wither you have the gff file with you or fasta.

You can look in to following post

Extract coordinates of upstream region up to closest coding region in R

ADD COMMENT
0
Entering edit mode
4.3 years ago
zubenel ▴ 120

If you have gff file you might try to use gff2fasta.pl with option -feature set as "five_prime_UTR" or "three_prime_UTR" or something like that. Also you may read how to get sequences of specific features with BioPerl.

ADD COMMENT

Login before adding your answer.

Traffic: 2283 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6