Question: Split the sequences of a file containing around 10K fasta sequences and check each sequence for the CDS region
gravatar for vivek.arora15693
6.3 years ago by
vivek.arora156930 wrote:

Hello All,

I am working on a database for the non AUG codons and I have a fasta file that i downloaded from refseq from an automated perl script. Now I want to check each and every sequence Individually of that file for the CDS region that have alternate translation initiation and hence print the Accession number from the header file.


To check the Alternate Initiation we need to check for the Kozak Context. -3 should be a purine and +4 a Guanine.

eg. AATAACTGGTTA. Counting from C at -3 is a purine and +4 is A  G !


I'm new to perl programming. Can anyone help me ?

And I don't wana use Bioperl so !


Thanks in Advance !

parsing perl • 1.4k views
ADD COMMENTlink modified 6.3 years ago by Woa2.8k • written 6.3 years ago by vivek.arora156930
gravatar for Woa
6.3 years ago by
United States
Woa2.8k wrote:

To extract the sequence from the RefSeq file I guess it is better to use a dedicated parser like that from Biperl, rather than writing something ad-hoc.

You can capture all the regex matches from the sequence string with something like this:

my @all_matches= ( $sq_str =~ /[AT]CTG[AG]/gi );
ADD COMMENTlink modified 8 months ago by RamRS30k • written 6.3 years ago by Woa2.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 821 users visited in the last hour