I am building an NGS pipeline from scratch. FASTQ files have been aligned to the hg19 reference with BWA-MEM. Samtools was used for sorting and creating the index. Picard tools was used for marking duplicates and estimate the library complexity.
At this point, I want to run GATK BaseRecalibrator. However, I get this error message:
A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found. reference contigs = [NC_000001.10, NT_113878.1, NT_167207.1, NC_000002.11, NC_000003.11, NC_000004.11, NT_113885.1, NT_113888.1, NC_000005.9, NC_000006.11, NC_000007.13, NT_113901.1, NC_000008.10, NT_113909.1, NT_113907.1, NC_000009.11, NT_113914.1, NT_113916.2, NT_113915.1, NT_113911.1, NC_000010.10, NC_000011.9, NT_113921.2, NC_000012.11, NC_000013.10, NC_000014.8, NC_000015.9, NC_000016.9, NC_000017.10, NT_113941.1, NT_113943.1, NT_113930.1, NT_113945.1, NC_000018.9, NT_113947.1, NC_000019.9, NT_113948.1, NT_113949.1, NC_000020.10, NC_000021.8, NT_113950.2, NC_000022.10, NC_000023.10, NC_000024.9, NT_113961.1, NT_113923.1, NT_167208.1, NT_167209.1, NT_167210.1, NT_167211.1, NT_167212.1, NT_113889.1, NT_167213.1, NT_167214.1, NT_167215.1, NT_167216.1, NT_167217.1, NT_167218.1, NT_167219.1, NT_167220.1, NT_167221.1, NT_167222.1, NT_167223.1, NT_167224.1, NT_167225.1, NT_167226.1, NT_167227.1, NT_167228.1, NT_167229.1, NT_167230.1, NT_167231.1, NT_167232.1, NT_167233.1, NT_167234.1, NT_167235.1, NT_167236.1, NT_167237.1, NT_167238.1, NT_167239.1, NT_167240.1, NT_167241.1, NT_167242.1, NT_167243.1, NW_004070864.2, NW_003571030.1, NW_003871056.3, NW_003871055.3, NW_003315905.1, NW_003315906.1, NW_003315907.1, NW_004070863.1, NW_003871057.1, NW_004070865.1, NW_003315903.1, NW_003315904.1, NW_003315908.1, NW_004504299.1, NW_003571032.1, NW_003571033.2, NW_003315909.1, NW_003571031.1, NW_003871060.1, NW_003871059.1, NW_003315910.1, NW_004775426.1, NW_003315911.1, NW_003871058.1, NW_003315912.1, NW_003315913.1, NW_004775427.1, NW_003315915.1, NW_003315916.1, NW_003571035.1, NW_003315914.1, NW_003571034.1, NW_003315920.1, NW_003571036.1, NW_003315917.2, NW_003315918.1, NW_003871061.1, NW_004775428.1, NW_003315919.1, NW_004070866.1, NW_003871063.1, NW_003315921.1, NW_004504300.1, NW_003871062.1, NW_004775429.1, NW_004166862.1, NW_003571039.1, NW_003571038.1, NW_004775430.1, NW_003871064.1, NW_003571041.1, NW_003571037.1, NW_003871065.1, NW_003315922.2, NW_003571040.1, NW_003571042.1, NW_004775431.1, NW_003871066.2, NW_003315923.1, NW_003315924.1, NW_003315928.1, NW_003871067.1, NW_003315929.1, NW_003315930.1, NW_003315931.1, NW_004504301.1, NW_004070869.1, NW_003315925.1, NW_004070867.1, NW_004070868.1, NW_003315926.1, NW_003315927.1, NW_003571043.1, NW_003871071.1, NW_003315932.1, NW_003315934.1, NW_003315935.1, NW_003871068.1, NW_004504302.1, NW_003871070.1, NW_004775432.1, NW_003871069.1, NW_003315933.1, NW_004070870.1, NW_003871075.1, NW_003871082.1, NW_003315936.1, NW_003571045.1, NW_003871073.1, NW_003871074.1, NW_003571046.1, NW_004070871.1, NW_003871081.1, NW_003871079.1, NW_003871077.1, NW_003871080.1, NW_003871078.1, NW_003871072.2, NW_003871076.1, NW_003571048.1, NW_003571049.1, NW_003871083.2, NW_003571047.1, NW_003571050.1, NW_003315938.1, NW_003315939.1, NW_003315941.1, NW_003315942.2, NW_004504303.2, NW_003315940.1, NW_003315937.1, NW_003571051.1, NW_004166863.1, NW_003315943.1, NW_003315944.1, NW_003871084.1, NW_003315945.1, NW_003871085.1, NW_003315946.1, NW_004070872.2, NW_003315952.2, NW_003315951.1, NW_003315950.2, NW_004775433.1, NW_003871090.1, NW_004166864.2, NW_003315949.1, NW_003315948.2, NW_003871091.1, NW_003871093.1, NW_003871092.1, NW_003315953.1, NW_003571052.1, NW_003871086.1, NW_003315947.1, NW_003871088.1, NW_003315954.1, NW_003315955.1, NW_003871089.1, NW_003871087.1, NW_003315956.1, NW_003315959.1, NW_003315960.1, NW_003315957.1, NW_003315958.1, NW_003315961.1, NW_003871094.1, NW_003571053.2, NW_003315962.1, NW_003315964.2, NW_003315965.1, NW_003315963.1, NW_004775434.1, NW_004166865.1, NW_003571054.1, NW_003571055.1, NW_003571056.1, NW_003571057.1, NW_003571058.1, NW_003571059.1, NW_003571060.1, NW_003571061.1, NW_003315966.1, NW_003871095.1, NW_004504304.1, NW_003571063.2, NW_003315967.1, NW_003315968.1, NW_003315969.1, NW_003315970.1, NW_004775435.1, NW_004070874.1, NW_004070873.1, NW_004070875.1, NW_003871096.1, NW_003315972.1, NW_003315971.2, NW_004504305.1, NW_004070876.1, NW_003571064.2, NW_003871098.1, NW_003871099.1, NW_004070879.1, NW_004166866.1, NW_004070880.2, NW_004070877.1, NW_004070881.1, NW_004070882.1, NW_003871100.1, NW_003871101.3, NW_004070883.1, NW_004070884.1, NW_004070885.1, NW_003871102.1, NW_004070878.1, NW_004070891.1, NW_004070892.1, NW_004070893.1, NW_004070886.1, NW_004070887.1, NW_004070888.1, NW_004070889.1, NW_004070890.2, NW_003871103.3, NT_167244.1, NT_113891.2, NT_167245.1, NT_167246.1, NT_167247.1, NT_167248.1, NT_167249.1, NT_167250.1, NT_167251.1, NC_012920.1] features contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y]
After running the GATK command the first time, I saw that it needed an additional index
reference.dict file. To create the file, I run
gatk CreateSequenceDictionary -R reference.fasta (as recommended on this page https://gatk.broadinstitute.org/hc/en-us/articles/360035531652-FASTA-Reference-genome-format) on the same reference file that was used for all previous analysis steps.
Previously, the reference file was only processed by the
bwa index reference.fasta command. I used the same
reference.fasta file for the entire pipeline.
The reference files look fine to me; I assume the error arises due to the chromosome labels (features contigs) in the
gnomAD.vcf file used as
--known-sites in the command:
gatk BaseRecalibrator -I sample.sorted.bam -R reference.fasta --known-sites gnomad.genomes.r2.1.1.sites.vcf --known-sites gnomad.exomes.r2.1.1.sites.vcf -O recal_data.table
Am i supposed to edit these input files to match the contigs labels? Do you recommend using other population
vcf files? Any other idea on how to fix this issue?
Any help would be appreciated.