GATK: Remove known variants from the sample vcf file
1
1
Entering edit mode
9.7 years ago
ravast ▴ 20

I have a test sample vcf file, from which I have to select the unique variants. As an initial step I want to remove the know variants from my sample. I used SelectVariants walker from GATK.. I got this error

Input files /home///Mouse_ref/mgp.v3.snps.rsIDdbSNPv137.vcf and reference have incompatible contigs: Relative ordering of overlapping contigs differs, which is unsafe.
##### ERROR   /home/Mouse_ref/mgp.v3.snps.rsIDdbSNPv137.vcf contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, X]
##### ERROR   reference contigs = [1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 3, 4, 5, 6, 7, 8, 9, MT, X, Y, JH584295.1, JH584292.1, GL456368.1, GL456396.1, GL456359.1, GL456382.1, GL456392.1, GL456394.1, GL456390.1, GL456387.1, GL456381.1, GL456370.1, GL456372.1, GL456389.1, GL456378.1, GL456360.1, GL456385.1, GL456383.1, GL456213.1, GL456239.1, GL456367.1, GL456366.1, GL456393.1, GL456216.1, GL456379.1, JH584304.1, GL456212.1, JH584302.1, JH584303.1, GL456210.1, GL456219.1, JH584300.1, JH584298.1, JH584294.1, GL456354.1, JH584296.1, JH584297.1, GL456221.1, JH584293.1, GL456350.1, GL456211.1, JH584301.1, GL456233.1, JH584299.1]
GATK • 4.6k views
ADD COMMENT
0
Entering edit mode

I didn't quite get if the problem was solved ... I get the same error as Ravast and it's not about the order in case of mouse genome - it's actually that those chromosomes/contigs are absent all together from the vcf file header of already known SNPs compiled by the Sanger institute!!!! I tried to circumvent it by modifying the header (adding the missing contigs in the header) but I get the same error .... :(

ADD REPLY
3
Entering edit mode
9.7 years ago

This is pretty common error. If you would have searched a little on this forum or online you could have got the answers. GATK requires order of chromosomes to be the same in both the files. Elaborately described here. I think you just used cat chr*.fa command to concatenate the individual fasta files (chromosomes) to make the reference file and that messed up the order. You are not wrong but this is how GATK works.

ADD COMMENT
0
Entering edit mode

Thank you ashutosh..

ADD REPLY
0
Entering edit mode

This should be comment rather than an answer :-) Also, if the answer solved your problem, you should "accept" it so that your question will be stored as solved.

ADD REPLY
0
Entering edit mode

Ashutosh..could you please let me know how I should rectify the error.

ADD REPLY
1
Entering edit mode

Karyotypically Ordered Hg19. You need to sort your reference fasta file or create a new reference fasta from scratch. Make sure it has all the chromosomes present in your vcf file in the same order ie. 1,2,3,4...X.

ADD REPLY

Login before adding your answer.

Traffic: 2995 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6