How to extract the first and last 2 nt from multiple sequences in a FASTA file?
1
0
Entering edit mode
2.7 years ago
praasu ▴ 40

Hi Everyone, I have multiple intron sequence fasta file. I would like to extract first and last 2 nucleotide sequence from that file. Please let me know if you have any suggestion or short code?

perl awk fasta genome sequence • 973 views
ADD COMMENT
3
Entering edit mode
2.7 years ago

Try seqkit subseq

# example
$ cat t.fasta
>s1
acntg
>s2
ACNNTG

# first 2 bases
$ seqkit subseq -r 1:2 t.fasta 
>s1
ac
>s2
AC

# last 2 bases
$ seqkit subseq -r -2:-1 t.fasta 
>s1
tg
>s2
TG

# first 2 +  last 2, although every strange
$ seqkit concat <(seqkit subseq -r 1:2 t.fasta) <(seqkit subseq -r -2:-1 t.fasta)
>s1
actg
>s2
ACTG
ADD COMMENT
0
Entering edit mode

Thank you very much

ADD REPLY
0
Entering edit mode

Accept this answer to mark the post as solved, and please don't ask the same question again.

ADD REPLY

Login before adding your answer.

Traffic: 2683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6