Question: Changing Chromosome Notation On Vcf
2
gravatar for Leandro Batista
7.7 years ago by
Paris
Leandro Batista100 wrote:

Hey guys,

I'm having a problem when using a mouse dbsnp vcf file for variant annotations with GATK's VariantAnnotator. This dbsnp vcf file has chromosomes notated as chr1, chr2 .... but my reference follows a different notation 1, 2, 3 ... This produces an error when running GATK, like the following:

Input files dbSNP.vcf and reference have incompatible contigs: No overlapping contigs found.

<h5>ERROR dbSNP.vcf contigs = [chr1, chr10, chr11, chr12, chr13, chr13random, chr14, chr15, chr16, chr17, chr17random, chr18, chr19, chr1random, chr2, chr3, chr3random, chr4, chr4random, chr5, chr5random, chr6, chr7, chr7random, chr8, chr8random, chr9, chr9random, chrM, chrUnrandom, chrX, chrXrandom, chrY, chrYrandom]</h5> <h5>ERROR reference contigs = [1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 3, 4, 5, 6, 7, 8, 9, MT, X, Y, NT166325, NT166464, NT166452, NT166480, NT166448, NT166458, NT166443, NT166466, NT166476, NT166479, NT166478, NT166474, NT166471, NT166445, NT166465, NT166457, NT166470, NT166454, NT166472, NT166449, NT166481, NT166337, NT166459, NT166456, NT166473, NT166461, NT166475, NT166462, NT166444, NT166453, NT166446, NT166469, NT072868, NT166335, NT166467, NT166283, NT166338, NT166340, NT166442, NT166334, NT166286, NT166451, NT166336, NT166339, NT166290, NT053651, NT166450, NT166447, NT166468, NT166460, NT166477, NT166455, NT166291, NT166463, NT166433, NT166402, NT166327, NT166308, NT166309, NT109319, NT166282, NT166314, NT166303, NT112000, NT110857, NT166280, NT166375, NT166311, NT166307, NT166310, NT166323, NT166437, NT166374, NT166364, NT166439, NT166328, NT166438, NT166389, NT162750, NT166436, NT166372, NT166440, NT166326, NT166342, NT166333, NT166435, NT166434, NT166341, NT166376, NT166387, NT166281, NT166313, NT166380, NT166360, NT166441, NT166359, NT166386, NT166356, NT166357, NT166423, NT166384, NT161879, NT161928, NT166388, NT161919, NT166381, NT166367, NT166392, NT166406, NT166365, NT166379, NT166358, NT161913, NT166378, NT166382, NT161926, NT166345, NT166385, NT165789, NT166368, NT166405, NT166390, NT166373, NT166361, NT166348, NT166369, NT161898, NT166417, NT166410, NT166383, NT166362, NT165754, NT166366, NT166363, NT161868, NT166407, NT165793, NT166352, NT161925, NT166412, NT165792, NT161924, NT166422, NT165795, NT166354, NT166350, NT165796, NT161904, NT166370, NT165798, NT165791, NT161885, NT166424, NT166346, NT165794, NT166377, NT166418, NT161877, NT166351, NT166408, NT166349, NT161906, NT166391, NT161892, NT166415, NT165790, NT166420, NT166353, NT166344, NT166371, NT161895, NT166404, NT166413, NT166419, NT161916, NT166347, NT161875, NT161911, NT161897, NT161866, NT166409, NT161872, NT166403, NT161902, NT166414, NT166416, NT166421, NT161923, NT_161937]</h5>

Is there any simple way/script that would allow me to change the chromosome notation on the dbsnp file to the same provided on the reference?

Thanks

vcf gatk dbsnp chromosome • 6.6k views
ADD COMMENTlink written 7.7 years ago by Leandro Batista100

hi, leandro,

I am doing sequencing with mouse too. however, when I use GATK and found that my dbSNP.vcf has the different chromosome order with reference data and my vcf file has chromosomes notated as 1, 2 3 but reference data has chr1, chr2 and chr3, totally opposite from yours. where do you download dbsnp? I downloaded dbsnp from sanger and reference data from Illunmina.

Besides, did you index reference sequence by BWA? I downloaded reference data from ILlunmina because it has been indexed already.

ADD REPLYlink written 6.1 years ago by Tonyzeng300
6
gravatar for brentp
7.7 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

You can remove 'chr' from the start of the line like this:

perl -pe 's/^chr//' dbsnp.vcf > dbsnp.nochr.vcf

Though it might be faster to change the other VCF since dbsnp is quite large.

ADD COMMENTlink written 7.7 years ago by brentp23k

Hi, brentp,

My question is totally opposite, I need to change the start number 1, 2, 3 of each data line to chr1, 2,3 because my VCF has no "chr" but my reference data has. how could I do that?

ADD REPLYlink written 6.1 years ago by Tonyzeng300
2

probably: perl -pe 's/^([^#])/chr\1/'

ADD REPLYlink written 6.1 years ago by brentp23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2332 users visited in the last hour