Question: chromosome coordinate, rs
2
gravatar for Ebtihal Kamal
18 months ago by
Sudan
Ebtihal Kamal20 wrote:

hello every one , I am new in bioinformatics i need to analyze my SNPs by SIFT the required format chromosome coordinates i have rs numbers for my SNPs in FASTA format how can I get the chromosome coordinates

snp • 770 views
ADD COMMENTlink modified 18 months ago by Sasha Fokin80 • written 18 months ago by Ebtihal Kamal20

Could you post some example data, here? Ebtihal Kamal

ADD REPLYlink modified 18 months ago • written 18 months ago by cpad011212k
rs756051270 
rs1001844648 
rs780156389 
rs749333159 
rs1247965121 
rs1165194803 
rs200324855 
rs1487182042 
rs778900428
ADD REPLYlink modified 18 months ago by finswimmer13k • written 18 months ago by Ebtihal Kamal20

https://rest.ensembl.org/vep/human/id/rs56116432?content-type=application/json;refseq=1

Output should give you sift, polyphen prediction for each ID in json format. Format json for rsID, coordinates, transcript, change and prediction. Ebtihal Kamal. Since these are supposed to be clinically significant (as you are looking for sift score), you should try clinvar vcf for these IDs. Clinvar vcf will have multiple levels of information for the IDs.

ADD REPLYlink modified 18 months ago • written 18 months ago by cpad011212k
4
gravatar for finswimmer
18 months ago by
finswimmer13k
Germany
finswimmer13k wrote:

Hello Ebtihal Kamal,

i have rs numbers for my SNPs in FASTA format

that's weird. rs numbers are just identifiers for dbSNP. FASTA describes a sequence.

To get the coordinates for a given set of rs numbers use Ensembl's VEP (hg19, hg38). It also outputs the SIFT prediction.

fin swimmer

ADD COMMENTlink modified 18 months ago • written 18 months ago by finswimmer13k

thank you for your help I tried that, but Ensembl's VEP (hg19, hg38) but there is a message telling me that the web site is blocked

ADD REPLYlink written 18 months ago by Ebtihal Kamal20

Hm, is ensembl itself reachable? What about BioMart?

ADD REPLYlink written 18 months ago by finswimmer13k
3
gravatar for Alex Reynolds
18 months ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

One way to do this in a scalable way is to parallelize searches of a VCF build that has been converted to BED, split by chromosome, and written to compressed archives. Compressed archives of the v151 SNP set will take up about 10Gb.

Preparing files

Here's a makefile for preparing archive files:

SHELL := /bin/bash
CWD=$(shell pwd)

# cf. ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/
VCF=ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/All_20180418.vcf.gz

all: All_20180418 split_all_20180418_by_chr

#
# starch
#

All_20180418:
        module add bedops; \
        wget -qO- ${VCF} \
                | gunzip -c - \
                | convert2bed --input=vcf --sort-tmpdir=$(CWD) - \
                | awk '{ print "chr"$$0; }' - \
                | starch --omit-signature - \
                > All_20180418.starch

split_all_20180418_by_chr:
        for chr in `unstarch --list-chr All_20180418.starch`; do \
                echo $${chr}; \
                starchstrip --include $${chr} All_20180418.starch > All_20180418.$${chr}.starch; \
        done

Parallel searches

Given a file of rs IDs in query_snps.txt, you can write results to a subdirectory called parallel_grep_file_results:

$ ./parallel_grep_by_file_of_snp_ids.sh $(CWD)/query_snps.txt $(CWD)/parallel_grep_file_results email@example.com

At the end of the search, whether it ends in success or failure, an email is sent to email@example.com (change, as needed). The output directory contains a BED file with all matches.

Parallel script

The parallel_grep_by_file_of_snp_ids.sh script might look something like the following. This script needs a SLURM job scheduler and an Environment Modules system that lets you load modules. Change rootStarchDir, partition and other variables as needed, depending on where you store files, what partitions you have set up, etc.

#!/bin/bash

queryFn=$(realpath ${1})
resultsDir=$(realpath ${2})
contactEmail=${3}

function joinBy { local IFS="$1"; shift; echo "$*"; }

module add bedops

rootStarchDir=/net/seq/data/genomes/informatics/GRCh38/dbSNP/v151
rootStarchPrefix=${rootStarchDir}/All_20180418
rootStarchFn=${rootStarchPrefix}.starch
partition=queue0
memPerCPU=8000

if [ -d ${resultsDir} ]; then
    echo "Error: Results directory already exists!"
    exit 1
fi
mkdir -p ${resultsDir}

jobIds=()
for chr in `unstarch --list-chr ${rootStarchFn}`; do
    echo ${chr}
    jobId=`sbatch --mem-per-cpu=${memPerCPU} --parsable --partition=${partition} --job-name="parallel_grep_dbSNP_${chr}" --output=${resultsDir}/${chr}.out --error=${resultsDir}/${chr}.err --wrap="${rootStarchDir}/parallel_grep_file.sh ${queryFn} ${rootStarchPrefix}.${chr}.starch ${resultsDir}/${chr}.hits"`
    jobIds+=("${jobId}")
done

dependencies=$(joinBy : "${jobIds[@]}")
sbatch --mem-per-cpu=${memPerCPU} --mail-type=END --mail-user=${contactEmail} --partition=${partition} --job-name="parallel_grep_dbSNP_reduce" --output=${resultsDir}/reduce.out --error=${resultsDir}/reduce.err --dependency=afterok:${dependencies} --wrap="${rootStarchDir}/parallel_grep_file_reduce.sh ${rootStarchFn} ${resultsDir}"

The reduce script parallel_grep_file_reduce.sh:

#!/bin/bash

rootStarchFn=$(realpath ${1})
resultsDir=$(realpath ${2})
resultsFn=${resultsDir}/results.bed

module add bedops

for chr in `unstarch --list-chr ${rootStarchFn}`; do
    cat ${resultsDir}/${chr}.hits >> ${resultsFn}.tmp
done

sort-bed --max-mem 6G --tmpdir ${resultsDir} ${resultsFn}.tmp > ${resultsFn}

# cleanup

rm -f ${resultsDir}/*.out ${resultsDir}/*.err ${resultsDir}/*.hits ${resultsFn}.tmp

This is a "map-reduce" search across nuclear chromosomes. The per-chromosome search results are merged (reduced) at the end.

ADD COMMENTlink modified 18 months ago • written 18 months ago by Alex Reynolds29k
0
gravatar for sangram_keshari
18 months ago by
sangram_keshari230 wrote:

If you talking about chromosome coordinates. You can either get from GTF file or a BED file for the species you working, depending upon if they are completely sequenced and annotated or not.

ADD COMMENTlink written 18 months ago by sangram_keshari230
0
gravatar for finswimmer
18 months ago by
finswimmer13k
Germany
finswimmer13k wrote:

Here's another solution:

0. Prerequisites

1. Get dbSNP

For hg19:

$ wget ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/00-All.vcf.gz

For hg38:

$ wget ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/00-All.vcf.gz

2. Convert vcf to bcf

$ bcftools convert -Ob -o dbSNP_b151.bcf.gz 00-All.vcf.gz

3. Index bcf file

$ bcftools index dbSNP_b151.bcf.gz

4. Query bcf file for SNPs and convert output to bed

$ parallel -j 4 'bcftools filter -i "ID=@snps.txt" -r {} dbSNP_b151.bcf.gz' ::: {1..22} X Y MT | grep -v "^#" | convert2bed --input=vcf > result.bed

Adopt the value for -j to define how many jobs should run in parallel.

snps.txt is the file which contains th rs IDs (one per line).

fin swimmer

ADD COMMENTlink modified 18 months ago • written 18 months ago by finswimmer13k
0
gravatar for Sasha Fokin
18 months ago by
Sasha Fokin80
Russia
Sasha Fokin80 wrote:

Hi, Ebtihal Kamal.

You can use browser tool to convert your RS ids to chromosome and positions.

There are two assemblies (build 150):

  • HG19 / GRCh37 (234,756,361 SNPs)
  • HG38 / GRCh38 (335,322,261 SNPs)
ADD COMMENTlink written 18 months ago by Sasha Fokin80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1535 users visited in the last hour