Question: [GATK] Weird error using Picard SortVcf on COSMIC file
4.0 years ago
umn_bist370 wrote:

I tried sorting my COSMIC vcf using Picard's SortVcf function to match my reference dictionary order. This was after an error using MuTect2 and visiting a provided link.


java -jar -Xmx32g /cm/shared/apps/picard/1.127/picard.jar SortVcf I=/work/gencode/CosmicCodingMuts.vcf O=/work/gencode/CosmicCodingMuts%.vcf SEQUENCE_DICTIONARY=/work/gencode/GRCh38.p5.genome.dict

Reference dictionary:

@HD    VN:1.4    SO:unsorted
@SQ    SN:chr1    LN:248956422    M5:2648ae1bacce4ec4b6cf337dcae37816
@SQ    SN:chr2    LN:242193529    M5:4bb4f82880a14111eb7327169ffb729b
@SQ    SN:chr3    LN:198295559    M5:a48af509898d3736ba95dc0912c0b461
@SQ    SN:chr4    LN:190214555    M5:3210fecf1eb92d5489da4346b3fddc6e
@SQ    SN:chr5    LN:181538259    M5:f7f05fb7ceea78cbc32ce652c540ff2d
@SQ    SN:chr6    LN:170805979    M5:6a48dfa97e854e3c6f186c8ff973f7dd
@SQ    SN:chr7    LN:159345973    M5:94eef2b96fd5a7c8db162c8c74378039
@SQ    SN:chr8    LN:145138636    M5:c67955b5f7815a9a1edfaa15893d3616
@SQ    SN:chr9    LN:138394717    M5:addd2795560986b7491c40b1faa3978a
@SQ    SN:chr10    LN:133797422    M5:907112d17fcb73bcab1ed1c72b97ce68
@SQ    SN:chr11    LN:135086622    M5:1511375dc2dd1b633af8cf439ae90cec
@SQ    SN:chr12    LN:133275309    M5:e81e16d3f44337034695a29b97708fce
@SQ    SN:chr13    LN:114364328    M5:17dab79b963ccd8e7377cef59a54fe1c
@SQ    SN:chr14    LN:107043718    M5:acbd9552c059d9b403e75ed26c1ce5bc
@SQ    SN:chr15    LN:101991189    M5:f036bd11158407596ca6bf3581454706
@SQ    SN:chr16    LN:90338345    M5:24e7cabfba3548a2bb4dff582b9ee870
@SQ    SN:chr17    LN:83257441    M5:a8499ca51d6fb77332c2d242923994eb
@SQ    SN:chr18    LN:80373285    M5:11eeaa801f6b0e2e36a1138616b8ee9a
@SQ    SN:chr19    LN:58617616    M5:b0eba2c7bb5c953d1e06a508b5e487de
@SQ    SN:chr20    LN:64444167    M5:b18e6c531b0bd70e949a7fc20859cb01
@SQ    SN:chr21    LN:46709983    M5:2f45a3455007b7e271509161e52954a9
@SQ    SN:chr22    LN:50818468    M5:221733a2a15e2de66d33e73d126c5109
@SQ    SN:chrX    LN:156040895    M5:49527016a48497d9d1cbd8e4a9049bd3
@SQ    SN:chrY    LN:57227415    M5:b2b7e6369564d89059e763cd6e736837
@SQ    SN:chrM    LN:16569    M5:c68f52674c9fb33aef52dcf399755519
@SQ    SN:GL000008.2    LN:209709    M5:a999388c587908f80406444cebe80ba3
@SQ    SN:GL000009.2    LN:201709    M5:862f555045546733591ff7ab15bcecbe
@SQ    SN:GL000194.1    LN:191469    M5:6ac8f815bf8e845bb3031b73f812c012
@SQ    SN:GL000195.1    LN:182896    M5:5d9ec007868d517e73543b005ba48535
@SQ    SN:GL000205.2    LN:185591    M5:458e71cd53dd1df4083dc7983a6c82c4

Interestingly after sorting my COSMIC vcf, all of my known contigs moves BEHIND all the unplaced contigs

##### ERROR   cosmic contigs = [GL000008.2, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000208.1, GL000209.2, GL000213.1, GL000214.1, GL000216.2, GL000218.1, GL000219.1, GL000220.1, GL000221.1, GL000224.1, GL000225.1, GL000226.1, GL000250.2, GL000251.2, GL000252.2, ... ... ... ... chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr20, chr21, chr22, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrM, chrX, chrY]

ERROR   reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM, GL000008.2, GL000009.2, GL000194.1, GL000195.1, GL000205.2,... ... ...

Is this possibly because my dictionary is unsorted? I was also recommended to use picard's LiftOver tool but my COSMIC files are specifically for GRCh38. The contigs should all match. As always, thank you for your time and help.

mutect2 rna-seq picard gatk
4.0 years ago
4.0 years ago
geek_y10k wrote:

Once you sort VCF using Picard, delete the index created by picard. GATK creates its own index and this error might disappear.

4.0 years ago by geek_y10k

This is something I had overlooked. Rerunning now, will update you.

EDIT: So far so good. Thank you very very much.

4.0 years ago by umn_bist370
