NCBI Remap Issue
0
0
Entering edit mode
5.6 years ago

Hi everyone!

I'm new to the field and I've just started to work on GWAS data. I have some old data (both ped and VCF) that I need to move from hg18 to hg19 to then do imputation on the Michigan server. I've done the remapping using remap.api.pl and the vcf file I have for each chromosome.

perl remap_api.pl --mode asm-asm --from GCF_000001405.12 --dest GCF_000001405.13 --in_format vcf --annotation ../Cleaned_2.recode.vcf --annot_out remapped_2

However, for some SNPs in I have something like this:

> HSCHRUN_RANDOM_CTG42    27080   rs11090516      C       T   .      .    REMAP_ALIGN=SP       GT
> HSCHRUN_RANDOM_CTG42    31729   rs1062731       C       T   .      .    PR;REMAP_ALIGN=SP    GT
 >HSCHRUN_RANDOM_CTG42    33699   rs730647        C       T   .      .    REMAP_ALIGN=SP       GT

(this is for chr21)

I've looked up online and it seems those are unplaced scaffold (ftp://ftp.kobic.kr/Data/refseq/refseq/H_sapiens/annotation/GRCh37_latest/refseq_identifiers/GRCh37_latest_assembly_report.txt)

How should I treat them? Do I have to eliminate them, if so, how?

Best, Filippo

remapping vcftools GWAS genome SNP • 2.0k views
ADD COMMENT
0
Entering edit mode

Does your input vcf file have chromosome names (as in chr1, chr2, etc) or accessions (NC_000001.11, NC_000002.12, etc)? If it is the former, you may want to try converting your vcf first to use the accessions before remap.

ADD REPLY
0
Entering edit mode

Hi! Thanks for the answer! It is with chromosome name and the vcf was cretead from PLINK/1.9. Is there an easy way to change chromosome to accessions?

Just to understand, are the 'HSCHRUN' artifacts or errors of the program or are correct remapping which just need to be correctly mapped to the chromosome?

ADD REPLY
0
Entering edit mode

I have a python script that can do this. Clone this to your local disk (expects python3) https://github.com/vkkodali/cthreepo and run cthreepo.py as follows:

cthreepo -i <input.vcf> -m GCF_000001405.12_NCBI36_assembly_report.txt -if ucsc -it rs -f vcf -o <output.vcf>

You will need the GCF_000001405.12_NCBI36_assembly_report.txt file which can be downloaded from here: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.12_NCBI36/GCF_000001405.12_NCBI36_assembly_report.txt

Here's some help text for the script:

$ cthreepo.py -h
usage: cthreepo.py [-h] [-i INFILE] [-o OUTFILE] [-m MAPFILE] [-if ID_FROM]
                   [-it ID_TO] [-ku] [-p] [-f FORMAT]

This script parses input file and converts the seq-id name from one kind to
the other

optional arguments:
  -h, --help            show this help message and exit
  -i INFILE, --infile INFILE
                        input file
  -o OUTFILE, --outfile OUTFILE
                        output file
  -m MAPFILE, --mapfile MAPFILE
                        NCBI style assembly_report file for mapping
  -if ID_FROM, --id_from ID_FROM
                        seq-id format in the input file; can be `ens`, `uc`,
                        `gb`, or `rs`; default is `uc`
  -it ID_TO, --id_to ID_TO
                        seq-id format in the output file; can be `ens`, `uc`,
                        `gb`, or `rs`; default is `rs`
  -ku, --keep_unmapped  keep lines that don't have seq-id matches
  -p, --primary         restrict to primary assembly only
  -f FORMAT, --format FORMAT
                        input file format; can be `gff3`, `gtf`, `bedgraph`
                        `bed`, `sam`, `vcf` or `wig`; default is `gff3`
ADD REPLY
0
Entering edit mode

Hi @vkkodali! Thanks a lot! I will give a try today and tell you the outcome :)

ADD REPLY

Login before adding your answer.

Traffic: 2289 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6