Here's a solution with replace
subcommand (usage)
of csvtk,
just download the .tar.gz
file, decompress and you can run :)
First you have to prepare a mapping file, which is a plain tab-separated text file,
you can easily use a spreadsheet software to create and export.
$ more mapping.tsv
chrA 1
chrB 2
chrC 3
I guess the SNP data file should be a tab-delimited file too. Here's a dummy one:
$ more data.tsv
chrA A z
chrA A x
chrA A c
chrB B v
chrB B d
chrC C tx
chrC C t
chrC C x
chrC C z
Then use csvtk
to edit the SNP data file:
$ ./csvtk -H -t replace -f 1 -p '(.+)' -r '{kv}' -k mapping.tsv data.tsv
[INFO] read key-value file: mapping.tsv
[INFO] 3 pairs of key-value loaded
1 A z
1 A x
1 A c
2 B v
2 B d
3 C tx
3 C t
3 C x
3 C z
The long-option version would be easier to understand:
./csvtk --no-header-row --tabs replace --fields 1 --pattern '(.+)' --replacement '{kv}' --kv-file mapping.tsv data.tsv
PS: this is a general method not limited to this case. sed
is good for single replacement, csvtk replace -k
can handle multiple replacements well, which is written in Go with good performance.
PS2: seqkit has exactly the same function to handle FASTA/Q files.
This solution works perfectly, and it is extremely quick, thank you very much!