Changing Chromosome Notation On Vcf
2
2
Entering edit mode
10.2 years ago

Hey guys,

I'm having a problem when using a mouse dbsnp vcf file for variant annotations with GATK's VariantAnnotator. This dbsnp vcf file has chromosomes notated as chr1, chr2 .... but my reference follows a different notation 1, 2, 3 ... This produces an error when running GATK, like the following:

Input files dbSNP.vcf and reference have incompatible contigs: No overlapping contigs found.

<h5>ERROR dbSNP.vcf contigs = [chr1, chr10, chr11, chr12, chr13, chr13random, chr14, chr15, chr16, chr17, chr17random, chr18, chr19, chr1random, chr2, chr3, chr3random, chr4, chr4random, chr5, chr5random, chr6, chr7, chr7random, chr8, chr8random, chr9, chr9random, chrM, chrUnrandom, chrX, chrXrandom, chrY, chrYrandom]</h5> <h5>ERROR reference contigs = [1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 3, 4, 5, 6, 7, 8, 9, MT, X, Y, NT166325, NT166464, NT166452, NT166480, NT166448, NT166458, NT166443, NT166466, NT166476, NT166479, NT166478, NT166474, NT166471, NT166445, NT166465, NT166457, NT166470, NT166454, NT166472, NT166449, NT166481, NT166337, NT166459, NT166456, NT166473, NT166461, NT166475, NT166462, NT166444, NT166453, NT166446, NT166469, NT072868, NT166335, NT166467, NT166283, NT166338, NT166340, NT166442, NT166334, NT166286, NT166451, NT166336, NT166339, NT166290, NT053651, NT166450, NT166447, NT166468, NT166460, NT166477, NT166455, NT166291, NT166463, NT166433, NT166402, NT166327, NT166308, NT166309, NT109319, NT166282, NT166314, NT166303, NT112000, NT110857, NT166280, NT166375, NT166311, NT166307, NT166310, NT166323, NT166437, NT166374, NT166364, NT166439, NT166328, NT166438, NT166389, NT162750, NT166436, NT166372, NT166440, NT166326, NT166342, NT166333, NT166435, NT166434, NT166341, NT166376, NT166387, NT166281, NT166313, NT166380, NT166360, NT166441, NT166359, NT166386, NT166356, NT166357, NT166423, NT166384, NT161879, NT161928, NT166388, NT161919, NT166381, NT166367, NT166392, NT166406, NT166365, NT166379, NT166358, NT161913, NT166378, NT166382, NT161926, NT166345, NT166385, NT165789, NT166368, NT166405, NT166390, NT166373, NT166361, NT166348, NT166369, NT161898, NT166417, NT166410, NT166383, NT166362, NT165754, NT166366, NT166363, NT161868, NT166407, NT165793, NT166352, NT161925, NT166412, NT165792, NT161924, NT166422, NT165795, NT166354, NT166350, NT165796, NT161904, NT166370, NT165798, NT165791, NT161885, NT166424, NT166346, NT165794, NT166377, NT166418, NT161877, NT166351, NT166408, NT166349, NT161906, NT166391, NT161892, NT166415, NT165790, NT166420, NT166353, NT166344, NT166371, NT161895, NT166404, NT166413, NT166419, NT161916, NT166347, NT161875, NT161911, NT161897, NT161866, NT166409, NT161872, NT166403, NT161902, NT166414, NT166416, NT166421, NT161923, NT_161937]</h5>

Is there any simple way/script that would allow me to change the chromosome notation on the dbsnp file to the same provided on the reference?

Thanks

chromosome vcf gatk dbsnp • 8.6k views
ADD COMMENT
0
Entering edit mode

hi, leandro,

I am doing sequencing with mouse too. however, when I use GATK and found that my dbSNP.vcf has the different chromosome order with reference data and my vcf file has chromosomes notated as 1, 2 3 but reference data has chr1, chr2 and chr3, totally opposite from yours. where do you download dbsnp? I downloaded dbsnp from sanger and reference data from Illunmina.

Besides, did you index reference sequence by BWA? I downloaded reference data from ILlunmina because it has been indexed already.

ADD REPLY
3
Entering edit mode
8.6 years ago
brentp 24k

Probably:

perl -pe 's/^([^#])/chr\1/'
ADD COMMENT
7
Entering edit mode
10.2 years ago
brentp 24k

You can remove 'chr' from the start of the line like this:

perl -pe 's/^chr//' dbsnp.vcf > dbsnp.nochr.vcf

Though it might be faster to change the other VCF since dbsnp is quite large.

ADD COMMENT
0
Entering edit mode

Hi, brentp,

My question is totally opposite, I need to change the start number 1, 2, 3 of each data line to chr1, 2,3 because my VCF has no "chr" but my reference data has. how could I do that?

ADD REPLY

Login before adding your answer.

Traffic: 1435 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6