Hi,
I have two files as following:
$ cat file_1.fas
>CHROM-g19-B-0001-66906-67533
ATTTGATTTCTCATGCTAAACATTTATTGGTG
>CHROM-g19-B-0010-143637-144790
TCTGTCGACGGCAACTGTGAAACTTATCAGTG
>CHROM-g19-B-0010-147754-150523
GCACCCTGAGCCGAACTGAATTCCTTGTGAT
$ cat file_2.txt
A00120 CHROM-g19-B-0001-66906-67533
A00122 CHROM-g19-B-0010-143637-144790
A00124 CHROM-g19-B-0010-145875-146742
A00125 CHROM-g19-B-0010-147754-150523
I need to rename entries in file_1.fas
with their corresponding ids in file_2.txt
, to get the following;
$ cat file_3.fas
>A00120
ATTTGATTTCTCATGCTAAACATTTATTGGTG
>A00122
TCTGTCGACGGCAACTGTGAAACTTATCAGTG
>A00125
GCACCCTGAGCCGAACTGAATTCCTTGTGAT
NOTES:
In my real data, file_2.txt
has some more ids that can not be found in file_1.fas
, and I don't need them either, because there will be no entries in file_1.fas
to be replaced. Example will be A00124 CHROM-g19-B-0010-145875-146742
in file_2.txt.
Thank you for helping me on this post.
Hossein
What have you tried? What programming language(s) do you know?
I'm still in the beginning of scripting. Know a bit of shell, and perl.
If you're doing this with Perl or Python you'll want to look at reading the contents of `file_2` into a "hash" or "dictionary" data structure. Then as you loop through the `file_1` contents you can identify the header lines and then use them as "keys" to return the associated "value".