Extract codon between two flanking sequences from FASTA
0
0
Entering edit mode
4.4 years ago
renyulb • 0

Hi all, given two 10bp flanking sequences, I would like to extract the codon between them across all samples in a FASTA file. For example:

Flank1:
CAGGCATGCC
Flank2:
TCATCGCTGG

FASTA

>sample1
GCGCACCATGGTCAGGCATGCCTCCTCATCGCTGGGCACAGCCCAGAGGGT
>sample2
GGCAGAACCCGCGCACCATGGTCAGGCATGCCACCTCATCGCTGGGCACAGCCCAGA
>sample3
GGCAGATTCCCCGCACCATGGTCAGGCATGCCACTTCATCGCTGGGCACA

Output

>sample1
TCC
>sample2
ACC
>sample3
ACT

I have performed the opposite of this where I extract the flanking sequences based on coordinates, as well as extracting the sequences between two coordinates using bedtools getfasta, but struggling with extracting based on flanking nucleotide sequences. Thanks for any help!

genome sequencing • 854 views
ADD COMMENT
0
Entering edit mode

in perl

$seq =~ /CAGGCATGCC ([ATGC]{3})+ TCATCGCTGG/x

print $1
ADD REPLY
0
Entering edit mode

This can work if the sequence is not wrapped.

perl -n -e '{if($_ =~ /CAGGCATGCC(.+)TCATCGCTGG/){print $x, $1, "\n" };$x=$_}' Test.fa

ADD REPLY

Login before adding your answer.

Traffic: 1564 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6