Question: VCF: Replacing RefSeq ID to chr in #CHROM
1
gravatar for Hadi M
3 months ago by
Hadi M30
Hadi M30 wrote:

Hi everyone,

I recently downloaded the latest dbSNP VCF and when opening the file, I noticed the #CHROM column is filled with RefSeq ID instead of chr1, chr2 or so on. Here is how the VCF looks like:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
NC_000001.11    10019   rs775809821     TA      T       .       .       RS=775809821;dbSNPBuildID=144;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=INDEL
NC_000001.11    10039   rs978760828     A       C       .       .       RS=978760828;dbSNPBuildID=150;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV
NC_000001.11    10043   rs1008829651    T       A       .       .       RS=1008829651;dbSNPBuildID=150;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV

I would like to convert the RefSeq ID to its corresponding chromosome number:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    10019   rs775809821     TA      T       .       .       RS=775809821;dbSNPBuildID=144;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=INDEL
chr1   10039   rs978760828     A       C       .       .       RS=978760828;dbSNPBuildID=150;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV
chr1    10043   rs1008829651    T       A       .       .       RS=1008829651;dbSNPBuildID=150;SSR=0;PSEUDOGENEINFO=DDX11L1:100287102;VC=SNV

Is a tool or script available that can convert the RefSeq ID? Thank you in advance.

vcf • 168 views
ADD COMMENTlink modified 5 weeks ago by devarora110 • written 3 months ago by Hadi M30
1
gravatar for devarora
5 weeks ago by
devarora110
SouthKorea
devarora110 wrote:

Look for the specific position of ref sequence here : Reference sequence with chromosome position

Change each reference position with chr1, chr2 and so on :

awk '{gsub(/NC_000001.11/, "chr1"); gsub(/NC_000002.12/, "chr2") gsub(/NC_000003.13/, "chr3") .......); print;}' your.vcf > your_final.vcf

Here you go.

ADD COMMENTlink modified 5 weeks ago by genomax80k • written 5 weeks ago by devarora110
0
gravatar for rrbutleriii
5 weeks ago by
rrbutleriii70
US, Chicago
rrbutleriii70 wrote:

Yeah, that was a real kick in the pants, as they didn't seem to post anything in the release notes, which still list them by chr1, chr2, etc.

Best. documentation. EVAR.

The easy solution is to have a renaming file for all the chromosomes of the format old_name new_name\n per bcftools and use the --rename-chrs option:

bcftools annotate --rename-chrs rename_file.txt --threads num_threads -o new.vcf.gz -Oz original.vcf.gz

Update: NCBI Assembly to the rescue. Look up your assembly (I did both GRCh37.p13 and GRCh38.p13) and click "Download the full sequence report". Or if you copy the links:

# Assembly reports for renaming chromosomes
report_dir='ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405'
wget -N "${report_dir}/GCF_000001405.39_GRCh38.p13/GCF_000001405.39_GRCh38.p13_assembly_report.txt"
wget -N "${report_dir}/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_assembly_report.txt"

# Grab the useful columns
for k in *assembly_report.txt
  do
    out=$(echo $k | sed 's/.txt/.chrnames/')
    grep -e '^[^#]' $k | awk '{ print $7, $1 }' > $out
done

# Annotate
bcftools annotate \
  --rename-chrs GCF_000001405.25_GRCh37.p13_assembly_report.chrnames \
  --threads 10 -Oz \
  -o GRCh37.dbSNP153.vcf.gz \
  GCF_000001405.25.gz
bcftools annotate \
  --rename-chrs GCF_000001405.39_GRCh38.p13_assembly_report.chrnames \
  --threads 10 -Oz \
  -o GRCh38.dbSNP153.vcf.gz \
  GCF_000001405.38.gz
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by rrbutleriii70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 990 users visited in the last hour