Remove text and add a string in odd rows in a fasta file using awk
2
0
Entering edit mode
2.6 years ago
Mgs3 ▴ 30

I have a file organized as such:

>Prevalence_Sequence_ID:13|ARO_Name:AxyX|ARO:3004143|Detection_Model:Protein Homolog Model
ATGAAGCAAAGAGTCCCTCTACGCACGTTCGTCCTATCTGCCGTATTAATTCTTATTACTGGTTGCTCGAAACCGGAAACCCAACCAGCCG
>Prevalence_Sequence_ID:14|ARO_Name:adeF|ARO:3004143|Detection_Model:Protein Homolog Model
ATGAATATCTCGAAATTCTTCATCGACCGGCCGATCTTCGCCGGCGTGCTTTCGATCCTGGTGTTGCTGGCGGGCATACTGGCCATGTTCC

For every odd row, i need to keep only the first and third column and add the text "|kraken:taxid|32630" at the end. Example below

>Prevalence_Sequence_ID:13|ARO:3004143|kraken:taxid|32630
ATGAAGCAAAGAGTCCCTCTACGCACGTTCGTCCTATCTGCCGTATTAATTCTTATTACTGGTTGCTCGAAACCGGAAACCCAACCAGCCG
>Prevalence_Sequence_ID:14|ARO:3004143|kraken:taxid|32630
ATGAATATCTCGAAATTCTTCATCGACCGGCCGATCTTCGCCGGCGTGCTTTCGATCCTGGTGTTGCTGGCGGGCATACTGGCCATGTTCC

Is there a simple awk script that i can use? In alternative i could also keep only the first column if it's easier

awk • 784 views
ADD COMMENT
0
Entering edit mode
$ awk -F '|' '/^>/ {print fields, "text";next}1' input.fa
ADD REPLY
2
Entering edit mode
2.6 years ago
Sam ★ 4.7k

I guess something like

awk -F "|" '{if($1 ~/^>/) {print $1"|"$3"|kraken:taxid|32630"}else{print $0}}' test

Should work (assuming you have a fastq file and you are aiming to change the header, not just the odd rows)

ADD COMMENT
2
Entering edit mode
2.6 years ago

Is there a simple awk script that i can use?

yes: awk -F '|' '/^>/ {printf(TODO);next;} {print;}'

II leave the TODO part as an exercice.

ADD COMMENT

Login before adding your answer.

Traffic: 2748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6