Question: Extract a list of SNPs from a BCF file
0
gravatar for Famf
23 months ago by
Famf20
United States
Famf20 wrote:

Hi there, I have a very large BCF file from which I want to extract a list of SNPs (also very large). The SNPs of interest are in a tab delimited file with two columns chromosome and position. I have tried the bcftools by using view function and the options -T and -R like bellow but I haven't had success.

bcftools view -T mylist.txt file.bcf -Ou -o filteredfile.bcf

Thanks in advance

snp bcf • 1.1k views
ADD COMMENTlink modified 23 months ago by genomax85k • written 23 months ago by Famf20

but I haven't had success

What does that mean? What is the exact problem you're facing? Can you also paste the first few lines of your mylist.txt file?

ADD REPLYlink written 23 months ago by RamRS27k

HI Ram,

can you please share with me what should be the format of this mylist.txt file?

ADD REPLYlink written 12 weeks ago by anamaria100

is it ok to do something like this:

vcftools --bcf gokind.bcf --snps mySNPs.txt --recode --recode-INFO-all --out SNPs_only

where mySNPs.txt looks like this:

rs12121
rs242343
rs2348724
ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by anamaria100

What do your questions have to do with my comment?

Is it ok to do something like this

Did you try it? Did it work? Did the results match your expectations?

ADD REPLYlink written 12 weeks ago by RamRS27k

Hi,

I tried it and this is what I got:

Outputting VCF file...
Error: Expected type 7 for string. Found type 9.
Error: Expected type 7 for string. Found type 0.
Error: Expected type 7 for string. Found type 0.
Error: Expected type 7 for string. Found type 0.
Error: Expected type 7 for string. Found type 0.
Error: Expected type 7 for string. Found type 0.
After filtering, kept 0 out of a possible 3 Sites
No data left for analysis!
ADD REPLYlink written 12 weeks ago by anamaria100

I also tried this:

bcftools view -T mylist.txt gokind2.bcf -Ou -o filteredfile.bcf

where mylist.txt was a tab separated file:

20  33371323
12  73950313
1   216957281

but I got this error:

[E::bcf_sr_regions_init] Could not parse the file mylist.txt, using the columns 1,2[,-1]
Failed to read the targets: mylist.txt

Can you please advise how mylist.txt should be formatted?

Thanks

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by anamaria100

Does using -R mylist.txt (instead of -T mylist.txt) work? What is the output to cat -te mylist.txt?

ADD REPLYlink written 12 weeks ago by RamRS27k

I tried what you suggested:

$ bcftools view -R mylist.txt gokind2.bcf -Ou -o filteredfile.bcf
[E::bcf_sr_regions_init] Could not parse the file mylist.txt, using the columns 1,2[,-1]
Failed to read the regions: mylist.txt

and:

 cat -te mylist.txt

gives me:

20^I33371323$
12^I73950313$
1^I216957281$
5^I174820027$
...
ADD REPLYlink written 12 weeks ago by anamaria100

Your mylist.txt looks fine. What is the bcftools version you're using? bcftools --version should give you the version info.

ADD REPLYlink written 12 weeks ago by RamRS27k

it's this one:

bcftools --version
bcftools 1.10.2-32-ge677391
Using htslib 1.10.2-46-g9a10355
Copyright (C) 2019 Genome Research Ltd.
ADD REPLYlink written 12 weeks ago by anamaria100

That's a new version. I have no idea what's going on here. Is there any chance you could go back to bcftools 1.9 and try this? It should not make a difference but just in case.

ADD REPLYlink written 12 weeks ago by RamRS27k

I am doing this on some cluster so ...but anyway thank you so much for debugging tips!

ADD REPLYlink written 12 weeks ago by anamaria100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1136 users visited in the last hour