How to find SNP locations in assemblies other than hg38
1
0
Entering edit mode
12 weeks ago
Bosberg ▴ 50

I have the following dbSNP accessions from some EPIC array data:

   # e.g.: 
   rs2468330
   rs877309
   rs2857639
   ... # 65 in total

and I am able to obtain the coordinates of these locations on the hg38 assembly using rsnps::ncbi_snp_query( ). However I need to compare these against the hg19 genome, which apparently is not supported in rsnps. Illumina's sample Sheet doesn't seem to provide this anywhere, and I'm not confident that the Liftover tool would be sufficiently precise to give me the right single-base letter if I just lifted these hg38 coordinates over (I need single-base resolution to see what was actually in the reference genome at the precise position of this snp. Maybe Liftover actually is precise enough for that? I just don't know ).

There is a graphical web-interface available here, but it would be painfully tedious to have to paste the accession in by hand 65 times (and for sure I would make mistakes in the process) -so I'm looking for some scripting solution to this.

Thanks for any ideas you might have.

hg19 SNP dbsnp • 486 views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I need single-base resolution to see what was actually in the reference genome at the precise position of this snp. Maybe Liftover actually is precise enough for that? I just don't know.

Yes, Liftover is very precise and it can be done pretty easily. Please check out LiftOver.

ADD REPLY
0
Entering edit mode

Thanks for the input, but I think I'd defer to the following advice from the UCSC website:

While the LiftOver tool exists to convert coordinates between assemblies, it is NOT recommended to use LiftOver to convert SNPs between two assembly versions. Using LiftOver to convert a very small region, especially a single base SNP, is not always a trivial task as the alignment may not get a high enough score to be considered a successful conversion.

ADD REPLY
0
Entering edit mode
11 weeks ago

First of all download some resources required to perform the liftover:

wget http://hgdownload.cse.ucsc.edu/goldenpath/hg38/liftOver/hg38ToHg19.over.chain.gz
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chromAlias.txt
wget https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.fa.gz
wget https://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/hg19.fa.gz
gunzip hg38.fa.gz
samtools faidx hg38.fa
gunzip hg19.fa.gz
samtools faidx hg19.fa

Then create a VCF for the SNVs you are interested in using the JSON API (which gives hg38 coordinates at the time of the writing of this answer ... also notice the following approach most likely only works for SNVs and not indels):

(echo -e "##fileformat=VCFv4.2\n#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO";
for id in 2468330 877309 2857639; do
  curl https://api.ncbi.nlm.nih.gov/variation/v0/refsnp/$id | \
  jq -r "[.refsnp_id] + [.primary_snapshot_data.placements_with_allele[0] | .seq_id, .alleles[0].allele.spdi.position + 1, .alleles[0].allele.spdi.deleted_sequence, .alleles[1:][].allele.spdi.inserted_sequence] | @tsv"
done | \
awk -F"\t" '{printf "%s\t%d\trs%d\t%s\t%s",$2,$3,$1,$4,$5; for (i=6; i<=NF; i++) printf ",%s",$i; printf "\t.\t.\t.\n"}') | \
bcftools view --no-version -Oz -o file.vcf.gz --write-index

Finally, use BCFtools/liftover to convert the coordinates from hg38 to hg19:

awk -F"\t" 'NR>1 {print $4"\t"$1}' hg38.chromAlias.txt | \
bcftools annotate --no-version -Ou --rename-chrs - file.vcf.gz | \
bcftools +liftover --no-version -- -s hg38.fa -f hg19.fa -c hg38ToHg19.over.chain.gz

You will get an answer as follows which should be close to what you want:

chr12   43198722    rs2468330   G   A,C .   .   .
chr1    11489678    rs877309    A   G   .   .   .
chr22   30055674    rs2857639   A   C,G,T   .   .   .
ADD COMMENT

Login before adding your answer.

Traffic: 1644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6