Question

Annotating file using bcftools

0

Entering edit mode

5 months ago

kl ▴ 10

Hi all,

I am trying to annotate my imputed genetic file using bcftools and then want to convert it to plink.

bcftools index -t ro_imputed_hrcgrch37.R2_0.3.vcf.gz
bcftools annotate \
-a $DATADIR/ro_imputed_hrcgrch37.R2_0.3.vcf.gz \
-c ID $REF/All_20180423.vcf.gz \
--output-type z \
-o $DATADIR/ro_imputed_hrcgrch37.R2_0.3.vcf_dbSNP151.vcf.gz

This seems to work but my samples are removed. Consequently, when I try to convert to binary plink files, it doesn't work because it says I have no samples. Can anyone give advice on what I've done wrong?

Many thanks

annotation plink bcftools • 652 views

ADD COMMENT • link 5 months ago by kl ▴ 10

0

Entering edit mode

Don't forget to follow up on your threads. If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work. If an answer was not really helpful or did not work, provide detailed feedback so others know not to use that answer.

Upvote|Bookmark|Accept

ADD REPLY • link 5 months ago by Pierre Lindenbaum 164k

Pierre Lindenbaum · Answer 1 · 2024-05-15

1

Entering edit mode

5 months ago

Pierre Lindenbaum 164k

I think your're annotating $REF/All_20180423.vcf.gz (DBSNP isn't it ? = no genotype) with your vcf as the database ro_imputed_hrcgrch37.R2_0.3.vcf.gz but your want the reverse : annotate your vcf with dbsnp.

bcftools annotate \
-a $REF/All_20180423.vcf.gz  \
-c ID $DATADIR/ro_imputed_hrcgrch37.R2_0.3.vcf.gz \
--output-type z \
-o $DATADIR/ro_imputed_hrcgrch37.R2_0.3.vcf_dbSNP151.vcf.gz

ADD COMMENT • link 5 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thanks - I corrected it. It doesn't seem to annotate. I converted to binary after which is what is shown below. It is not the output I want. The second column I wanted to be the rsid extracted from the All_2018423.vcf.gz file. based on chromosome, position and allele match. I would appreciate any suggestions. I haven't used bcftools before.

22 22:51218224:C:A 0 51218224 A C 22 22:51218377:G:C 0 51218377 C G 22 22:51219006:G:A 0 51219006 A G 22 22:51219387:T:C 0 51219387 C T 22 22:51221190:G:A 0 51221190 A G 22 22:51221731:T:C 0 51221731 C T 22 22:51222100:G:T 0 51222100 T G 22 22:51223637:G:A 0 51223637 A G 22 22:51229805:T:C 0 51229805 C T 22 22:51237063:T:C 0 51237063 C T

Thanks

ADD REPLY • link 5 months ago by kl ▴ 10

0

Entering edit mode

It worked with this bcftools index -t $DATADIR/cpro_imputed_hrcgrch37.R2_0.3.vcf.gz

bcftools annotate  \
-a $RefGenomes/All_20180423.vcf.gz  \
-c CHROM,FROM,TO,ID $DATADIR/ro_imputed_hrcgrch37.R2_0.3.vcf.gz \
-output-type z \
-o $DATADIR/ro_imputed_hrcgrch37.R2_0.3.vcf_dbSNP151.vcf.gz

ADD REPLY • link updated 5 months ago by Pierre Lindenbaum 164k • written 5 months ago by kl ▴ 10

0

Entering edit mode

Do you know if there is a way to leave chr:pos in ID if there is no matching rsid based on chromosome and position, as variants have been reduced by half.

ADD REPLY • link 5 months ago by kl ▴ 10