Please add an explicit type tag :NAME
0
0
Entering edit mode
6.0 years ago

Dear all,

I am totally new to exome-seq data analysis. Currently, I have got stuck in "base recalibration" step. When I use the following command:

java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R chr19.fa -I tumor.bam -knownSites no_headerchr19.vcf -o recal.table

I get this error:

##### ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file 'no_headerchr19.vcf' could not be determined dynamically. Please add an explicit type tag :NAME listing the correct type from among the supported types:
##### ERROR          Name        FeatureType   Documentation
##### ERROR          BCF2     VariantContext   (this is an external codec and is not documented within GATK)
##### ERROR        BEAGLE      BeagleFeature   (this is an external codec and is not documented within GATK)
##### ERROR           BED         BEDFeature   (this is an external codec and is not documented within GATK)
##### ERROR      BEDTABLE       TableFeature   (this is an external codec and is not documented within GATK)
##### ERROR EXAMPLEBINARY            Feature   (this is an external codec and is not documented within GATK)
##### ERROR      GELITEXT    GeliTextFeature   (this is an external codec and is not documented within GATK)
##### ERROR     RAWHAPMAP   RawHapMapFeature   (this is an external codec and is not documented within GATK)
##### ERROR        REFSEQ      RefSeqFeature   (this is an external codec and is not documented within GATK)
##### ERROR     SAMPILEUP   SAMPileupFeature   (this is an external codec and is not documented within GATK)
##### ERROR       SAMREAD     SAMReadFeature   (this is an external codec and is not documented within GATK)
##### ERROR         TABLE       TableFeature   (this is an external codec and is not documented within GATK)
##### ERROR           VCF     VariantContext   (this is an external codec and is not documented within GATK)
##### ERROR          VCF3     VariantContext   (this is an external codec and is not documented within GATK)
##### ERROR ------------------------------------------------------------------------------------------

I have reviewed many pages discussing the same error, but they couldn't help me to solve this problem.

I was wondering if anyone could help me how to figure out this problem. Please just keep it simple. I am not an expert.

Any idea and suggestions would be appreciated.

Thanks.

software-error next-gen genome • 3.3k views
ADD COMMENT
1
Entering edit mode

Hello,

GATK cannot determine what type of file is your no_headerchr19.vcf. I guess it will try to look in the header of your vcf file to see that it is an VCF file. You filename implied that you have no header in the file. If so the solution is to add a valid vcf header.

fin swimmer

ADD REPLY
0
Entering edit mode

actually, I downloaded my VCF file from this link: common_all_20170710.vcf.gz

then I extracted chromosome 19 variants with this command:

zcat common_all_20170710.vcf.gz | grep -w '^#\|^chr19' > chr19_CommonVariants.vcf

but when I use this vcf file with the following command, I get this error:

java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R chr19.fa -I tumor.bam -knownSites chr19_CommonVariants.vcf -o recal.table

ERROR MESSAGE: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file

therefore I removed the header with this command:

egrep -v "^#" chr19_CommonVariants.vcf > no_headerchr19.vcf
ADD REPLY
0
Entering edit mode

The error message tells you that the "CHROM header line" is missing. This line looks like this

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO

Until now I don't understand why your grep command doesn't fetch this line. Without -w is does.

Instead of using grep try using tabix to extract your region. This is much faster.

You first need to index the database.

tabix index -p vcf common_all_20170710.vcf.gz

Than you can query this way:

tabix -h common_all_20170710.vcf.gz chr19

fin swimmer

ADD REPLY
0
Entering edit mode

Are you sure that there is a "chr19". common_all_20170710.vcf.gz looks like dbSNP. There the chromosome name has no "chr" at the beginning. It's just "19".

ADD REPLY
0
Entering edit mode

thanks, fin! I again extracted chr19 variants from the reference vcf file by your recommended command:

tabix -h common_all_20170710.vcf.gz chr19 > common_variants_chr19.vcf

but when I executed the base recalibration command, I got this error:

java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R chr19.fa -I tumor.bam -knownSites common_variants.vcf -o recal.table


##### ERROR MESSAGE: Input files /home/mahdi/Exome_seq/common_variants.vcf and reference have incompatible contigs. Please see https://www.broadinstitute.org/gatk/guide/article?id=63for more information. Error details: No overlapping contigs found.
##### ERROR   /home/mahdi/Exome_seq/common_variants.vcf contigs = [chr19]
##### ERROR   reference contigs = [19]

I tried to change the header file with this command:

mv chr19.fa Chr19.fa
sed 's/>19/\>chr19/g' Chr19.fa > chr19.fa

this is the header:

head chr19.fa
>chr19 dna:chromosome chromosome:GRCh38:19:1:58617616:1 REF

but I still get the same error:

MESSAGE: Input files /home/mahdi/Exome_seq/common_variants.vcf and reference have incompatible contigs.
ADD REPLY
1
Entering edit mode

Hello mahdikhadem9,

why you are writing this post three times?

Before renaming any contig you should go through all the needed file and make sure how the naming convention ist there.

The reference file:

$ grep ">" chr19.fa 
>chr19

You bam file:

$ samtools view -H tumor.bam|grep @SQ
@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10    LN:135534747
@SQ SN:chr11    LN:135006516
@SQ SN:chr12    LN:133851895
@SQ SN:chr13    LN:115169878
@SQ SN:chr14    LN:107349540
@SQ SN:chr15    LN:102531392
@SQ SN:chr16    LN:90354753
@SQ SN:chr17    LN:81195210
@SQ SN:chr18    LN:78077248
@SQ SN:chr19    LN:59128983
@SQ SN:chr20    LN:63025520
@SQ SN:chr21    LN:48129895
@SQ SN:chr22    LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566

The variant file from which you extract the variants:

$ tabix common_all_20170710.vcf.gz chr19|head -n 1
$ tabix common_all_20170710.vcf.gz 19|head -n 1
19  60360   rs111660247 C   G   .   .   RS=111660247;RSPOS=60360;dbSNPBuildID=132;SSR=0;SAO=0;VP=0x050000000005000002000100;WGT=1;VC=SNV;ASP

As you can see in my case the reference file and the bam file uses chromome names with chr and the vcf file without. The easiest way to get all in sync is here to modify the vcf file after extracting all variants of chromosom 19 and prepend a chr to each line:

$ awk '{if($0 !~ /^#/) print "chr"$0; else print $0}' common_variants_chr19.vcf > with_chr.vcf

If your files are already in sync, than you don't have to modify anything.

fin swimmer

ADD REPLY
0
Entering edit mode

Hi fin swimmer

I apologize for sending this post three times. I had a problem with my internet connection which made me think that I couldn't send it at first.

These are the headers of my files:

tabix common_all_20170710.vcf.gz chr19|head -n 1
chr19   62935   rs534193774 CCT C   .   .   RS=534193774;RSPOS=62936;dbSNPBuildID=142;SSR=0;SAO=0;VP=0x050000080005040024000200;GENEINFO=WASH5P:375690;WGT=1;VC=DIV;INT;ASP;VLD;KGPhase3;CAF=0.9952,0.004792;COMMON=1


samtools view -H tumor.bam|grep @SQ
@SQ SN:19   LN:58617616


grep ">" chr19.fa
>19 dna:chromosome chromosome:GRCh38:19:1:58617616:1 REF

As you see, my VCF file header is chr19 and my reference sequence file header is 19. so I changed the header with the following command:

mv chr19.fa Chr19.fa
sed 's/>19/\>chr19/g' Chr19.fa > chr19.fa

grep ">" chr19.fa
>chr19 dna:chromosome chromosome:GRCh38:19:1:58617616:1 REF

Now the headers of my VCF and reference sequence files are the same (chr19). But when I use the following command, I get the same error as before.

java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R chr19.fa -I tumor.bam -knownSites common_variants.vcf -o recal.table

##### ERROR MESSAGE: Input files /home/mahdi/Exome_seq/common_variants.vcf and reference have incompatible contigs.
Error details: No overlapping contigs found.
##### ERROR   /home/mahdi/Exome_seq/common_variants.vcf contigs = [chr19]
##### ERROR   reference contigs = [19]

thank you again fin swimmer. I highly appreciate your consideration and the time you devote.

ADD REPLY
1
Entering edit mode

It is easier to modify the vcf, otherwise you have to modify the ref and your bam file. To get rid of the "chr" use this:

sed 's/^chr//' common_variants.vcf > without_chr.vcf

But three more general things:

  1. Your reference is hg38. Make sure the vcf file you use is also from this reference genome before continue.
  2. If your bam file only contains the chromosome 19 I guess it isn't necessary to subset the initial vcf file.
  3. Have you checked the overall quality of your reads before? If it's good the benefit of BQSR might be negligible and you could omit this step.

fin swimmer

ADD REPLY
0
Entering edit mode

The problem solved! as you proposed, I removed "chr" tag from VCF file and the code executed without any problem afterward.

Thank you for your help "fin swimmer"

ADD REPLY
0
Entering edit mode

thanks fin! I again extracted chr19 variants from the reference vcf file by your recommended command:

tabix -h common_all_20170710.vcf.gz chr19 > common_variants_chr19.vcf

but when I executed the base recalibration command, I got this error:

java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R chr19.fa -I tumor.bam -knownSites common_variants.vcf -o recal.table


##### ERROR MESSAGE: Input files /home/mahdi/Exome_seq/common_variants.vcf and reference have incompatible contigs. Please see https://www.broadinstitute.org/gatk/guide/article?id=63for more information. Error details: No overlapping contigs found.
##### ERROR   /home/mahdi/Exome_seq/common_variants.vcf contigs = [chr19]
##### ERROR   reference contigs = [19]

I tried to change the header file with this command:

mv chr19.fa Chr19.fa
sed 's/>19/\>chr19/g' Chr19.fa > chr19.fa

this is the header:

head chr19.fa
>chr19 dna:chromosome chromosome:GRCh38:19:1:58617616:1 REF

but I still get the same error:

MESSAGE: Input files /home/mahdi/Exome_seq/common_variants.vcf and reference have incompatible contigs.
ADD REPLY
1
Entering edit mode

thanks, fin! I again extracted chr19 variants from the reference vcf file by your recommended command:

tabix -h common_all_20170710.vcf.gz chr19 > common_variants_chr19.vcf

but when I executed the base recalibration command, I got this error:

java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R chr19.fa -I tumor.bam -knownSites common_variants.vcf -o recal.table


##### ERROR MESSAGE: Input files /home/mahdi/Exome_seq/common_variants.vcf and reference have incompatible contigs. Please see https://www.broadinstitute.org/gatk/guide/article?id=63for more information. Error details: No overlapping contigs found.
##### ERROR   /home/mahdi/Exome_seq/common_variants.vcf contigs = [chr19]
##### ERROR   reference contigs = [19]

I tried to change the header file with this command:

mv chr19.fa Chr19.fa
sed 's/>19/\>chr19/g' Chr19.fa > chr19.fa

this is the header:

head chr19.fa
>chr19 dna:chromosome chromosome:GRCh38:19:1:58617616:1 REF

but I still get the same error:

MESSAGE: Input files /home/mahdi/Exome_seq/common_variants.vcf and reference have incompatible contigs.
ADD REPLY

Login before adding your answer.

Traffic: 3138 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6