Change sample names in vcf
2
3
Entering edit mode
3.6 years ago
leeandroid ▴ 110

Hi everyone.

I want to simplify the names of the samples of a VCF file by removing the suffix that was added during sequenciation but i am not being successful. Right now my samples are named like this:

123_4:A7TT6AKXX:1:1000000001 or ABC100:A7TT6AKXX:1:1000000001


In the first case I want to keep only the number before the underscore and in the second everything before the first colon (:). How can I do it?

Here's how the header is organized in my vcf file:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  102_3:C6RL4ANXX:1:250529518     206E:C6RL4ANXX:2:250529583      AN07:C6RL4ANXX:2:250529585     B103:C6RL4ANXX:2:250529647      CA:C6RL4A


vcf sample name • 14k views
0
Entering edit mode

but i am not being successful.

What have you tried? People will be more to help if you show the effort you put into this, trying to solve your question. We rather point out mistakes than giving a full answer (which in the long run doesn't learn you a lot).

0
Entering edit mode

0
Entering edit mode

Seems that I failed to explain myself on the original post. The goal is to change the names of my samples like this: 123_4:A7TT6AKXX:1:1000000001 > 123

ABC100:A7TT6AKXX:1:1000000001 > ABC100

Eventually I was able to change them using bcftools reheader* - like Pierre suggested - but I had to extract samples names from the original vcf and proceed to change it manually.

There is no other way to change it directly on the vcf maybe with grep or sed commands?

• bcftools reheader -s new_names.txt old.vcf > new.vcf

Thank you.

1
Entering edit mode

123_4:A7TT6AKXX:1:1000000001 > 123 ABC100:A7TT6AKXX:1:1000000001 > ABC100

$echo 123_4:A7TT6AKXX:1:1000000001 |awk 'gsub ("_.*","")' 123$ echo ABC100:A7TT6AKXX:1:1000000001 |awk 'sub (":.*","")'
ABC100


Requested you to post your VCF header (with samples). with the information provided in OP,

input (two columns with case1 and case2):

$cat test.txt 123_4:A7TT6AKXX:1:1000000001 ABC100:A7TT6AKXX:1:1000000001  output: $ awk '{gsub("_.*\t","\t"); gsub(":.*","\t"); print}' test.txt
123 ABC100


However for code to work on OP VCF, VCF header with sample names is necessary (with few example sample names)

0
Entering edit mode

I'll try your solution and post if it worked. Thank you!

1
Entering edit mode

Input:

CHROM   POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  102_3:C6RL4ANXX:1:250529518 206E:C6RL4ANXX:2:250529583  AN07:C6RL4ANXX:2:250529585  B103:C6RL4ANXX:2:250529647  CA:C6RL4A


code:

$awk -F[:_] 'NR==1{$1=$1}1' OFS=_ test.txt | awk '{for(N=1; N<=NF; N++) sub(/_.*/, "",$N)}1'


output:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 102 206E AN07 B103 CA

0
Entering edit mode

3
Entering edit mode

Thanks. Next time, please post example input and expected output. This would help forum better to answer your queries fast.

10
Entering edit mode
3.6 years ago

-s, --samples FILE

new sample names, one name per line, in the same order as they appear in the VCF file. Alternatively, only samples which need to be renamed can be listed as "old_name new_name\n" pairs separated by whitespaces, each on a separate line. If a sample name contains spaces, the spaces can be escaped using the backslash character, for example "Not\ a\ good\ sample\ name".

2
Entering edit mode
2.2 years ago
Phoenix Mu ▴ 40

It seems that bcftools reheader --samples <your file> -o <output> <your input file> would be helpful.