I wondered if anyone could help me please? (apologies I can't give screen shots - as I work on a secure server).
I want to extract all the sample IDs that relate to a specific variant. I have a multisample VCF file for each chromosome.
I used to be able to extract this information using the code below and get all the IDs seperated by columns with the relevant VCF info (qual, filter, info, format, format_ouput). My variant text file (tab delimited) contains: Chr Pos Ref Alt
bcftools view -R "variants.txt" "mymultisample.vcf" > "newoutput.vcf"
However, all I get now is a few columns with lots of mixed mish mash of information that doesn't make sense. I also tried bcftools view -t, bcftools view -r,
different formatting of the variant.txt file but no luck.
I tried also the above code provided by Ram link to post:
bcftools view -O v -R "$variants" "$vcf_in" \
| grep -Ef <(awk 'BEGIN{FS=OFS="\t";print "#"};{print "^"$1,$2,"[^\t]+",$3,$4"\t"}' "$variants") \
> "$vcf_out"
However, i only get a long line of IDs within one of the excel cells but no other VCF information that I was previously getting.
Could it be the vcf file itself that is the problem?
Many thanks, Julia
Side note: The code is from Sean, not me.
Oh sorry Ram. Apologies!