Question: Split the sequences of a file containing around 10K fasta sequences and check each sequence for the CDS region
0
gravatar for vivek.arora15693
4.9 years ago by
India
vivek.arora156930 wrote:

Hello All,

I am working on a database for the non AUG codons and I have a fasta file that i downloaded from refseq from an automated perl script. Now I want to check each and every sequence Individually of that file for the CDS region that have alternate translation initiation and hence print the Accession number from the header file.

 

To check the Alternate Initiation we need to check for the Kozak Context. -3 should be a purine and +4 a Guanine.

eg. AATAACTGGTTA. Counting from C at -3 is a purine and +4 is A  G !

 

I'm new to perl programming. Can anyone help me ?

And I don't wana use Bioperl so !

 

Thanks in Advance !

parsing perl • 1.2k views
ADD COMMENTlink modified 4.9 years ago by Woa2.7k • written 4.9 years ago by vivek.arora156930
0
gravatar for Woa
4.9 years ago by
Woa2.7k
United States
Woa2.7k wrote:

To extract the sequence from the RefSeq file I guess it is better to use a dedicated parser like that from Biperl, rather than writing something ad-hoc.

You can capture all the regex  matches from the sequence string with something like this:

my @all_matches= ( $sq_str =~ /[AT]CTG[AG]/gi );

 

ADD COMMENTlink written 4.9 years ago by Woa2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1889 users visited in the last hour