Question: add nucleotide in the begining of fasta sequences
0
gravatar for amitpande74
5 weeks ago by
amitpande740 wrote:

HI, I want to add 2 nucleotides in the beginning of each line in a FASTA file.

> 
GCATAGGC

the desired output

>
TAGCATAGGC

can someone help.

add sequence nucleotide fasta • 237 views
ADD COMMENTlink modified 5 weeks ago by shenwei3565.6k • written 5 weeks ago by amitpande740

What have you tried? This can be done with a sed command that matched the first character and replaced the line-beginning anchor with TA.

ADD REPLYlink written 5 weeks ago by _r_am30k

sed -i 's/^/TA/' file.fasta

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by amitpande740

That does not match the first character in each line. You'll end up adding TA to the header lines too, and that too before the > lines, essentially corrupting the FASTA file.

Also, don't use -i until you're 100% sure the command is exactly what you want.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by _r_am30k

yes, it does add a TA to the header. Then what exactly should be the command.

ADD REPLYlink written 5 weeks ago by amitpande740

amitpande74, please accept all answers that solve your question.

Upvote|Bookmark|Accept

ADD REPLYlink written 5 weeks ago by _r_am30k

A: Fasta file edition

Replace "ACTG" with "TA".

ADD REPLYlink written 5 weeks ago by genomax92k
3
gravatar for shenwei356
5 weeks ago by
shenwei3565.6k
China
shenwei3565.6k wrote:

seqkit mutate can edit FASTA sequence (point mutation, insertion, deletion) . Please use v0.14.0rc1 or later version which fix a bug for insersion

seqkit mutate -i supports inserting bases at any position. For example, for two (multi-line) sequences.

$ cat seqs.fa 
>seq1
GCATAGGC
>seq2
AAACCC
GGGTTT

1). At the beginning

$ cat seqs.fa | seqkit mutate -i 0:TA
>seq1
TAGCATAGGC
>seq2
TAAAACCCGGGTTT

2). At the end.

$ cat seqs.fa | seqkit mutate -i -1:TA
>seq1
GCATAGGCTA
>seq2
AAACCCGGGTTTTA

3). Behind the 5th base

$ cat seqs.fa | seqkit mutate -i 5:TA
>seq1
GCATATAGGC
>seq2
AAACCTACGGGTTT
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by shenwei3565.6k

nice solution, great to know, this most certainly simplifies the task

ADD REPLYlink written 5 weeks ago by Istvan Albert ♦♦ 85k
2
gravatar for Fatima
5 weeks ago by
Fatima830
United states
Fatima830 wrote:

If each sequence is one and only one line, and they Capital letters. (This works for both nucleotide and amino acid sequences; you can replace [A-Z] with [ATGC] if you want to be more specific.)

sed '/^[A-Z]/s/^/TA/'  fila.fasta > output.fasta

If you also have multi-line sequences, then you can first use this command to convert it to one-liner sequences:

awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);}  END {printf("\n");}'  input.fasta > file.fasta
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Fatima830
2
gravatar for cpad0112
5 weeks ago by
cpad011214k
Hyderabad India
cpad011214k wrote:

close. filter the headers ( assuming that sequences are in single line):

$ sed '/^>/! s/^/TA/' test.fa

or, you can also use:

$ sed  "0~2 s/^/TA&/" test.fa

with Awk:

$ awk -v OFS="\n" '/^>/ {getline seq; print $0,"TA"seq}' test.fa
$ awk '{print ((NR%2)? "":"TA") $0}' test.fa
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by cpad011214k
2
gravatar for Istvan Albert
5 weeks ago by
Istvan Albert ♦♦ 85k
University Park, USA
Istvan Albert ♦♦ 85k wrote:

When the FASTA file may span multiple lines and when the resulting FASTA should be well-formed (wrapped at the same length) one needs to chain up more commands.

My best bet makes use of both bioawk and seqkit (both a installable with bioconda):

cat foo.fa | bioawk -v prefix="TATA" -c fastx '{ printf(">%s\n%s%s",$name, prefix, $seq) }' | seqkit seq

prints

>foo
TATAATGGACTCTCGTCCTCAGAAAGTCTGGATGACGCCGAGTCTCACTGAATCTGACAT
GGATTACCACAAGATCTTGACAGCAGGTCTGTCCGTTCAACAGGGGGTTGTTCGGCAAAG
AGTCATCCCAGTGTATCAAGTAAACAATCTTGAGATCCCAGTGTATCAAGTAAACAATCT
TGAGATCCCAGTGTATCAAGTAAACAATCTTGAGATCCCAGTGTATCAAGTAAACAATCT
TGAGATCCCAGTGTATCAAGTAAACAATCTTGAG

Uses the trick shown in A: Fasta file edition

ADD COMMENTlink written 5 weeks ago by Istvan Albert ♦♦ 85k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1603 users visited in the last hour