GATK Exome PON
11 months ago
chimerajit • 0
1. I am performing exome analyis
2. I did alingmnet with bwa-mem
3. Reference GRCh38.p12 genecode
4. I performed deduplication using picard
5. I am using gatk4-4.1.4.1-0

Now I am in the step where I need to use Mutect2 for making my PON(panel of Normals) I downloded GATK Ref file somatic-hg38_af-only-gnomad.hg38.vcf.gz and the .tbi file

now when I am running Mutect2 it is sowing error

A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found

I checked further and found

reference contigs = [NC_000001.11, NT_187361.1, NT_187362.1, NT_187363.1


and

features contigs = [chr1, chr2, chr3


This mean that I used different reference for alingmnet? if so which reference then I use for GATK? or is there any other way to make my bam file compatible? please help

SNP GATK Mutect2 • 408 views
Yes, it means the contig names in the alignment don't match those in the VCF resource. Can you check your alignment and ensure they have the right naming conventions? The NC_ names are NCBI reference IDs that should not exist in any proper reference file.

Yes I checked my reference file fasta and it have NC_000001.11 and similar contig where as when I checked the UCSC hg38.fa it have similar contig name like "chr1_KI270709v1_random". Now either I need to start with UCSC genome or I need to find the VCF which support UCSC genome. Any recomendation?

I am unable to understand what you're saying. If the two contig sets are different, you'll either need to edit your VCF using bcftools or awk/sed, or redo the entire process, from alignment to calling using a stable reference across the board. I'd recommend you use GRCh38.p13 from GENCODE and avoid all other "reference" resources

thanks for comment; it is resolved

11 months ago
chimerajit • 0

GATK resources files like .vcf and other annotation files compatible with UCSC hg38 and not with Genecode GRCh38. The problem will occur when you use GRCh38 as reference and then use any GATK VCF files required by either to generate your own Pannel or Normal or call somatic/germline variation.

The bam generated by using GRCh38 reference will have contig names like "NC_000001.11" where your reference VCF will have "chr1" etc. Simple option is use UCSC hg38 when you willing to use GATK resource files. Or You may generate custome VCF for your purpose by using dbSNP resources but that is bit triky.

