Question: Annotate genomic positions with dbSNP rsIds
0
gravatar for Jimbou
6 months ago by
Jimbou690
Germany
Jimbou690 wrote:

Although I already found some ways to annotate genomic positions with rsIDs using e.g. UCSC table browser, I'm not happy with that since I want a one-in-all linux script taking also strand issues (flipped alleles A-T vs- T-A or switched reference alleles) into account.

What I have:

chr position ref alt
10  169560   G   T
10  171117   G   A
10  171126   G   A
10  172995   A   C
10  178499   C   T

What I want:

chr position ref alt rsID
10  169560   G   T   rsXXX
10  171117   G   A   rsXXX, rsXXX
10  171126   G   A   rsXXX
10  172995   A   C   rsXXX
10  178499   C   T   rsXXX

Thanks

tool chr rsid position dbsnp • 442 views
ADD COMMENTlink modified 6 months ago • written 6 months ago by Jimbou690
2
gravatar for Jimbou
6 months ago by
Jimbou690
Germany
Jimbou690 wrote:

I will write down my solution as an answer for documentation purposes. I started as Pirerre recommended, but then I used bcftools instead of GATK.

First, I created a header .txt file for the custom vcf file

##fileformat=VCFv4.0
##fileDate=09052019
##source=allchr_allvsall_sex_adjusted
##reference==GRCh37.p13
##phasing=partial
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO

Then I used awk to generate the data for vcf according the specifications (8 columns). Setting ID="." == missing, Quality to 100 and PASS for the filter for all positions. Of note my_chr_pos_alt_ref.out.gz data consists only of autosomal SNVs!

zcat my_chr_pos_alt_ref.out.gz | awk '{print $1, ".", $2, $3, $4, 100, "PASS", "AA="$3}' OFS='\t' > tmp.vcf

add the header

cat header.txt tmp.vcf > mydata.vcf
rm tmp*

zipped and indexed

bgzip mydata.vcf
tabix -p vcf mydata.vcf.gz

Finally annotated rsIDs using:

bcftools annotate \
-a 00-common_all.vcf.gz \
-c ID mydata.vcf.gz \
--output-type z \
-o mydata_dbSNP151.vcf.gz

dbSNP files from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/

ADD COMMENTlink modified 6 months ago • written 6 months ago by Jimbou690
0
gravatar for Pierre Lindenbaum
6 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

use awk to convert to vcf and then use gatk VariantAnnotator https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_annotator_VariantAnnotator.php with --dbsnp

ADD COMMENTlink written 6 months ago by Pierre Lindenbaum124k

Thanks a lot. Started as you recommended, but switched to bcftools in the end.

ADD REPLYlink written 6 months ago by Jimbou690
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 956 users visited in the last hour