Question: VCF: Replacing RefSeq ID to chr in #CHROM
1
gravatar for Hadi M
9 months ago by
Hadi M30
Hadi M30 wrote:

Hi everyone,

I recently downloaded the latest dbSNP VCF and when opening the file, I noticed the #CHROM column is filled with RefSeq ID instead of chr1, chr2 or so on. Here is how the VCF looks like:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
NC_000001.11    10019   rs775809821     TA      T       .       .       RS=775809821;dbSNPBuildID=144;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=INDEL
NC_000001.11    10039   rs978760828     A       C       .       .       RS=978760828;dbSNPBuildID=150;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV
NC_000001.11    10043   rs1008829651    T       A       .       .       RS=1008829651;dbSNPBuildID=150;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV

I would like to convert the RefSeq ID to its corresponding chromosome number:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    10019   rs775809821     TA      T       .       .       RS=775809821;dbSNPBuildID=144;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=INDEL
chr1   10039   rs978760828     A       C       .       .       RS=978760828;dbSNPBuildID=150;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV
chr1    10043   rs1008829651    T       A       .       .       RS=1008829651;dbSNPBuildID=150;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV

Is a tool or script available that can convert the RefSeq ID? Thank you in advance.

vcf • 425 views
ADD COMMENTlink modified 7 months ago by devarora230 • written 9 months ago by Hadi M30
1
gravatar for devarora
7 months ago by
devarora230
SouthKorea
devarora230 wrote:

Look for the specific position of ref sequence here : Reference sequence with chromosome position

Change each reference position with chr1, chr2 and so on :

awk '{gsub(/NC_000001.11/, "chr1"); gsub(/NC_000002.12/, "chr2") gsub(/NC_000003.13/, "chr3") .......); print;}' your.vcf > your_final.vcf

Here you go.

ADD COMMENTlink modified 7 months ago by genomax89k • written 7 months ago by devarora230
0
gravatar for rrbutleriii
7 months ago by
rrbutleriii80
US, Chicago
rrbutleriii80 wrote:

Yeah, that was a real kick in the pants, as they didn't seem to post anything in the release notes, which still list them by chr1, chr2, etc.

Best. documentation. EVAR.

The easy solution is to have a renaming file for all the chromosomes of the format old_name new_name\n per bcftools and use the --rename-chrs option:

bcftools annotate --rename-chrs rename_file.txt --threads num_threads -o new.vcf.gz -Oz original.vcf.gz

Update: NCBI Assembly to the rescue. Look up your assembly (I did both GRCh37.p13 and GRCh38.p13) and click "Download the full sequence report". Or if you copy the links:

# Assembly reports for renaming chromosomes
report_dir='ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405'
wget -N "${report_dir}/GCF_000001405.39_GRCh38.p13/GCF_000001405.39_GRCh38.p13_assembly_report.txt"
wget -N "${report_dir}/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_assembly_report.txt"

# Grab the useful columns
for k in *assembly_report.txt
  do
    out=$(echo $k | sed 's/.txt/.chrnames/')
    grep -e '^[^#]' $k | awk '{ print $7, $1 }' > $out
done

# Annotate
bcftools annotate \
  --rename-chrs GCF_000001405.25_GRCh37.p13_assembly_report.chrnames \
  --threads 10 -Oz \
  -o GRCh37.dbSNP153.vcf.gz \
  GCF_000001405.25.gz
bcftools annotate \
  --rename-chrs GCF_000001405.39_GRCh38.p13_assembly_report.chrnames \
  --threads 10 -Oz \
  -o GRCh38.dbSNP153.vcf.gz \
  GCF_000001405.38.gz
ADD COMMENTlink modified 7 months ago • written 7 months ago by rrbutleriii80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1628 users visited in the last hour