BCFtools view -R function
1
0
Entering edit mode
28 days ago

I wondered if anyone could help me please? (apologies I can't give screen shots - as I work on a secure server).

I want to extract all the sample IDs that relate to a specific variant. I have a multisample VCF file for each chromosome.

I used to be able to extract this information using the code below and get all the IDs seperated by columns with the relevant VCF info (qual, filter, info, format, format_ouput). My variant text file (tab delimited) contains: Chr Pos Ref Alt

bcftools view -R "variants.txt" "mymultisample.vcf" > "newoutput.vcf"


However, all I get now is a few columns with lots of mixed mish mash of information that doesn't make sense. I also tried bcftools view -t, bcftools view -r, bcftools view -T, different formatting of the variant.txt file but no luck.

VCF view BCFtools • 840 views
0
Entering edit mode

It would help if you showed a concrete example. Try finding a position in the variants.txt file for which you manually found a match in the VCF but the operation does now show expected output. This process will help you figure out why you're not seeing what you think you should be seeing.

0
Entering edit mode

Thank you Ram, just saw your response. I will try that. When you say "manually found", is that inspecting the file with the grep command or viewing it for example in excel?

0
Entering edit mode

Yes, I mean finding a match manually with a piece of software that can do plain text searching. grep would be the way to go. Excel brings its own complications with it.

0
Entering edit mode

Ok great, thank you Ram. I will try that. Here is a screen shot of what i used to get. This is a slightly made up example, as the real output is on the secure server.

0
Entering edit mode

I'd recommend splitting multi-allelics and left-aligning the VCF before doing the operation.

Your data looks transposed to me - is that something you did on purpose? Can you describe why you call this a "mish-mash" I'm assuming you abstracted the values in the ID fields.

0
Entering edit mode

Hello Ram,

i have taken some screen shots from the ouput as I get it to explain it properly (sorry the other file was somehwat transposed).

This is the mish mash example I now get using bcftools view -R genotype.txt myfile.vcf.gz > output.vcf

This is the good example that I used to get with the same command (bcftools view -R genotype.txt myfile.vcf.gz > output.vcf)

EGAN is the IDs that correspond to the patients where I need to retrieve the phenotype. That is the the bit I don't get with the tabix file. But the format of the columns is correct, just missing the EGAN IDs.

Thank you

0
Entering edit mode

I mean, could I assume that the order is the exact same one as the list I get when i do:

bcftools query -l myvcf.vcf > sample_list_ids.txt


That order seems to be mantained in the original files where I was still getting the EGAN IDs. I just don't want to assume.

This how the current ouput looks like.

0
Entering edit mode

i have just tried:

bcftools view -H -R genotype.txt myvcf.vcf.gz > output.vcf


This give me the nice column output but again no EGAN IDs.

0
Entering edit mode

Your "mishmash" file seems to be have the content just with different delimiters. Please stop using Excel and use grep and head instead to show content in plain text.

How does the same command give different output? Are you using different input files? Maybe a different version of bcftools?

0
Entering edit mode

Thank you Ram. The chromosomes files are all split into seperate chromosome files. That might the problem why the same command gives different output.

Noted, will aim to use grep and head. The bcftool version is the same.

0
Entering edit mode

Please see if you can find a linux pro near you. This looks like a problem that needs to be eyeballed and probed on your machine (as in there are too many variables to be accounted for and an online discussion might go on forever). I can't help you with this right now, sorry.

You may need to examine EOF characters, specify options explicitly, and try a bunch of things.

0
Entering edit mode

Ok, thank you very much Ram. Really appreciate your help. I will try and find someon locally. Thank you, Julia

1
Entering edit mode

randomly, this code made it work:

tabix -h -R genotype.txt myfile.vcf.gz > ouput.vcf

0
Entering edit mode

Glad you found something that works. Now for the fun part of figuring out why something else didn't work as it should have.

0
Entering edit mode
27 days ago

I have just worked out if I tabix the file

tabix -p vcf file.vcf.gz


and then use

tabix -R genotype.txt file.vcf.gz > output.vcf


Then I get the output I want but the only thing I am missing is the corresponding IDs.

Can anyone help with that?

Thank you,