BCFtools view -R function
1
0
Entering edit mode
2.6 years ago

I wondered if anyone could help me please? (apologies I can't give screen shots - as I work on a secure server).

I want to extract all the sample IDs that relate to a specific variant. I have a multisample VCF file for each chromosome.

I used to be able to extract this information using the code below and get all the IDs seperated by columns with the relevant VCF info (qual, filter, info, format, format_ouput). My variant text file (tab delimited) contains: Chr Pos Ref Alt

bcftools view -R "variants.txt" "mymultisample.vcf" > "newoutput.vcf"

However, all I get now is a few columns with lots of mixed mish mash of information that doesn't make sense. I also tried bcftools view -t, bcftools view -r, bcftools view -T, different formatting of the variant.txt file but no luck.

Thank you so so much for your help in advance.

VCF view BCFtools • 5.1k views
ADD COMMENT
0
Entering edit mode

It would help if you showed a concrete example. Try finding a position in the variants.txt file for which you manually found a match in the VCF but the operation does now show expected output. This process will help you figure out why you're not seeing what you think you should be seeing.

ADD REPLY
0
Entering edit mode

Thank you Ram, just saw your response. I will try that. When you say "manually found", is that inspecting the file with the grep command or viewing it for example in excel?

ADD REPLY
0
Entering edit mode

Yes, I mean finding a match manually with a piece of software that can do plain text searching. grep would be the way to go. Excel brings its own complications with it.

ADD REPLY
0
Entering edit mode

Ok great, thank you Ram. I will try that. Here is a screen shot of what i used to get. This is a slightly made up example, as the real output is on the secure server. VCFoutput

ADD REPLY
0
Entering edit mode

I'd recommend splitting multi-allelics and left-aligning the VCF before doing the operation.

Your data looks transposed to me - is that something you did on purpose? Can you describe why you call this a "mish-mash" I'm assuming you abstracted the values in the ID fields.

ADD REPLY
0
Entering edit mode

Hello Ram,

i have taken some screen shots from the ouput as I get it to explain it properly (sorry the other file was somehwat transposed).

This is the mish mash example I now get using bcftools view -R genotype.txt myfile.vcf.gz > output.vcf MIsh mash

This is the good example that I used to get with the same command (bcftools view -R genotype.txt myfile.vcf.gz > output.vcf) Good example

EGAN is the IDs that correspond to the patients where I need to retrieve the phenotype. That is the the bit I don't get with the tabix file. But the format of the columns is correct, just missing the EGAN IDs.

Thank you

ADD REPLY
0
Entering edit mode

I mean, could I assume that the order is the exact same one as the list I get when i do:

bcftools query -l myvcf.vcf > sample_list_ids.txt

That order seems to be mantained in the original files where I was still getting the EGAN IDs. I just don't want to assume.

This how the current ouput looks like.

new_ouput

ADD REPLY
0
Entering edit mode

i have just tried:

bcftools view -H -R genotype.txt myvcf.vcf.gz > output.vcf

This give me the nice column output but again no EGAN IDs.

ADD REPLY
0
Entering edit mode

Your "mishmash" file seems to be have the content just with different delimiters. Please stop using Excel and use grep and head instead to show content in plain text.

How does the same command give different output? Are you using different input files? Maybe a different version of bcftools?

ADD REPLY
0
Entering edit mode

Thank you Ram. The chromosomes files are all split into seperate chromosome files. That might the problem why the same command gives different output.

Noted, will aim to use grep and head. The bcftool version is the same.

ADD REPLY
0
Entering edit mode

Please see if you can find a linux pro near you. This looks like a problem that needs to be eyeballed and probed on your machine (as in there are too many variables to be accounted for and an online discussion might go on forever). I can't help you with this right now, sorry.

You may need to examine EOF characters, specify options explicitly, and try a bunch of things.

ADD REPLY
0
Entering edit mode

Ok, thank you very much Ram. Really appreciate your help. I will try and find someon locally. Thank you, Julia

ADD REPLY
1
Entering edit mode

randomly, this code made it work:

tabix -h -R genotype.txt myfile.vcf.gz > ouput.vcf
ADD REPLY
0
Entering edit mode

Glad you found something that works. Now for the fun part of figuring out why something else didn't work as it should have.

ADD REPLY
0
Entering edit mode
2.6 years ago

I have just worked out if I tabix the file

tabix -p vcf file.vcf.gz

and then use

tabix -R genotype.txt file.vcf.gz > output.vcf

Then I get the output I want but the only thing I am missing is the corresponding IDs.

Can anyone help with that?

Thank you,

ADD COMMENT

Login before adding your answer.

Traffic: 2517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6