Question: Problem replacing contigs in a dbsnp vcf using Picard SortVcf
0
gravatar for idedios
4.1 years ago by
idedios30
USA/Irvine/NeoGenomics Laboratories
idedios30 wrote:

Right now I'm trying to change the contigs in the dbsnp vcf to match those of my reference genome (from 1, 2, ..., Y, MT to chrM, chr1, chr2,...)

I'm currently using JDK 1.7 u79 for compatibility with MuTect 1.7

SortVcf was used as such:

"java -jar picard.jar SortVcf \

INPUT=dbsnp.vcf \

OUTPUT=dbsnp.fixed.vcf \

SEQUENCE_DICTIONARY=hg19.dict

Here's SortVcf's output:

Exception in thread "main" java.lang.NullPointerException
    at htsjdk.variant.variantcontext.VariantContextComparator.compare(VariantContextComparator.java:84)
    at htsjdk.variant.variantcontext.VariantContextComparator.compare(VariantContextComparator.java:21)
    at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
    at java.util.TimSort.sort(TimSort.java:203)
    at java.util.Arrays.sort(Arrays.java:727)
    at htsjdk.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:218)
    at htsjdk.samtools.util.SortingCollection.add(SortingCollection.java:165)
    at picard.vcf.SortVcf.sortInputs(SortVcf.java:154)
    at picard.vcf.SortVcf.doWork(SortVcf.java:87)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

 

contig picard vcf • 2.1k views
ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by idedios30
1

Have you already added prefix "chr" to your dbSNP file and then using the new file as an input ? I am not sure if this is the problem but may be try the following command first on your dbSNP file:

awk '{if($0 !~ /^#/) print "chr"$0; else print $0}' dbSNP_old.vcf > dbSNP_new.vcf

You can now use the dbSNP_new,vcf file as an input for the picard. I have a python script that sorts vcf file based on a given order of chromosomes OR You can download it from here (https://github.com/ashutoshkpandey/SimplePrograms/blob/master/sort_vcf.py) and modify it accordingly.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by Ashutosh Pandey11k

On top off adding the prefix I have to fix the mitochondrial chromosome contig from MT to chrM. Then maybe SortVcf can reorder it.

Thanks!

ADD REPLYlink written 4.1 years ago by idedios30
1
sed 's/MT/chrM/g' dbSNP_new.vcf > dbSNP_extranew.vcf :-)

OR 

sed -i 's/MT/chrM/g' dbSNP_new.vcf 

will not produce a new file. The "-i" will edit the file on the spot.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by Ashutosh Pandey11k

Thanks I really need to learn to use awk and sed.

ADD REPLYlink written 4.1 years ago by idedios30

Yes. They are really handy and pretty fast when it comes to text manipulation. I would strongly suggest you to learn basic awk one liners. 

ADD REPLYlink written 4.1 years ago by Ashutosh Pandey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 531 users visited in the last hour