Batch replcement of fasta header
1
0
Entering edit mode
8.4 years ago
mbk0asis ▴ 680

Hello

I'm trying to add some texts next to the existing fasta headers.

The fasta file contains 100 mRNA sequences, and additional information are saved in a separate file.

Input 1:

>NM_001133328
ggccattacggccggggctcccgctcgccctgaacctagtacctgcagccatg

Input 2:

NM_001133328    chr2b    +    107030810    107070838    ATIC
Output;
>NM_001133328    chr2b    +    107030810    107070838    ATIC
ggccattacggccggggctcccgctcgccctgaacctagtacctgcagccatg

I tried using 'awk' and 'while' commands, but couldn't figure it out.

Here's the command:

cat Input2 | while read line; do awk '/^>/ {$0=$0 "'$line'"}1' Input1 ; done | head

and the error message

awk: cmd. line:1:             ^ unterminated string
awk: cmd. line:1: /^>/ {$0=$0 " NM_001168566
awk: cmd. line:1:             ^ syntax error

Can anyone help me?

Thank you!

fasta • 1.5k views
ADD COMMENT
3
Entering edit mode
8.4 years ago
Sam ★ 4.7k

awk can handle two file at once:

awk -v arrayStore -v fastaLine=1 'FNR==NR {arrayStore[NR] = $1;} FNR!=NR{if($1~/^>/){ print $0" "arrayStore[fastaLine]; fastaLine=fastaLine+1;}else{print}' Input2 Input1 > output

You can also use hash mapping which should be similar

ADD COMMENT

Login before adding your answer.

Traffic: 2589 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6