Question: Change sample names in vcf
1
gravatar for leeandroid
2.4 years ago by
leeandroid90
leeandroid90 wrote:

Hi everyone.

I want to simplify the names of the samples of a VCF file by removing the suffix that was added during sequenciation but i am not being successful. Right now my samples are named like this:

123_4:A7TT6AKXX:1:1000000001 or ABC100:A7TT6AKXX:1:1000000001

In the first case I want to keep only the number before the underscore and in the second everything before the first colon (:). How can I do it?

Here's how the header is organized in my vcf file:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  102_3:C6RL4ANXX:1:250529518     206E:C6RL4ANXX:2:250529583      AN07:C6RL4ANXX:2:250529585     B103:C6RL4ANXX:2:250529647      CA:C6RL4A

Thanks in advance.

sample name vcf • 8.9k views
ADD COMMENTlink modified 12 months ago by Phoenix Mu20 • written 2.4 years ago by leeandroid90

but i am not being successful.

What have you tried? People will be more to help if you show the effort you put into this, trying to solve your question. We rather point out mistakes than giving a full answer (which in the long run doesn't learn you a lot).

ADD REPLYlink written 2.4 years ago by WouterDeCoster43k

could you please post example VCF line containing your sample names?

ADD REPLYlink written 2.4 years ago by cpad011212k

Thank you all for replying!

Seems that I failed to explain myself on the original post. The goal is to change the names of my samples like this: 123_4:A7TT6AKXX:1:1000000001 > 123

ABC100:A7TT6AKXX:1:1000000001 > ABC100

Eventually I was able to change them using bcftools reheader* - like Pierre suggested - but I had to extract samples names from the original vcf and proceed to change it manually.

There is no other way to change it directly on the vcf maybe with grep or sed commands?

  • bcftools reheader -s new_names.txt old.vcf > new.vcf

Thank you.

ADD REPLYlink written 2.4 years ago by leeandroid90
1

123_4:A7TT6AKXX:1:1000000001 > 123 ABC100:A7TT6AKXX:1:1000000001 > ABC100

$ echo 123_4:A7TT6AKXX:1:1000000001 |awk 'gsub ("_.*","")'
123
$ echo ABC100:A7TT6AKXX:1:1000000001 |awk 'sub (":.*","")'
ABC100

Requested you to post your VCF header (with samples). with the information provided in OP,

input (two columns with case1 and case2):

$ cat test.txt 
123_4:A7TT6AKXX:1:1000000001    ABC100:A7TT6AKXX:1:1000000001

output:

$ awk '{gsub("_.*\t","\t"); gsub(":.*","\t"); print}' test.txt 
123 ABC100

However for code to work on OP VCF, VCF header with sample names is necessary (with few example sample names)

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by cpad011212k

Added header but not sure if in the right format.

I'll try your solution and post if it worked. Thank you!

ADD REPLYlink written 2.4 years ago by leeandroid90
1

Input:

CHROM   POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  102_3:C6RL4ANXX:1:250529518 206E:C6RL4ANXX:2:250529583  AN07:C6RL4ANXX:2:250529585  B103:C6RL4ANXX:2:250529647  CA:C6RL4A

code:

$ awk -F[:_] 'NR==1{$1=$1}1' OFS=_ test.txt  | awk '{for(N=1; N<=NF; N++) sub(/_.*/, "", $N)}1'

output:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 102 206E AN07 B103 CA
ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by cpad011212k

It works, cpad0112! Thank you for your precious help.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by leeandroid90
2

Thanks. Next time, please post example input and expected output. This would help forum better to answer your queries fast.

ADD REPLYlink written 2.4 years ago by cpad011212k
6
gravatar for Pierre Lindenbaum
2.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum127k wrote:

bcftools reheader https://samtools.github.io/bcftools/bcftools.html

-s, --samples FILE

new sample names, one name per line, in the same order as they appear in the VCF file. Alternatively, only samples which need to be renamed can be listed as "old_name new_name\n" pairs separated by whitespaces, each on a separate line. If a sample name contains spaces, the spaces can be escaped using the backslash character, for example "Not\ a\ good\ sample\ name".

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Pierre Lindenbaum127k
0
gravatar for Phoenix Mu
12 months ago by
Phoenix Mu20
Phoenix Mu20 wrote:

It seems that bcftools reheader --samples <your file> -o <output> <your input file> would be helpful.

ADD COMMENTlink written 12 months ago by Phoenix Mu20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1776 users visited in the last hour