Entering edit mode
6.4 years ago
mahdikhadem95
▴
30
Dear all,
I am totally new to exome-seq data analysis. Currently, I have got stuck in "base recalibration" step. When I use the following command:
java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R chr19.fa -I tumor.bam -knownSites no_headerchr19.vcf -o recal.table
I get this error:
##### ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file 'no_headerchr19.vcf' could not be determined dynamically. Please add an explicit type tag :NAME listing the correct type from among the supported types:
##### ERROR Name FeatureType Documentation
##### ERROR BCF2 VariantContext (this is an external codec and is not documented within GATK)
##### ERROR BEAGLE BeagleFeature (this is an external codec and is not documented within GATK)
##### ERROR BED BEDFeature (this is an external codec and is not documented within GATK)
##### ERROR BEDTABLE TableFeature (this is an external codec and is not documented within GATK)
##### ERROR EXAMPLEBINARY Feature (this is an external codec and is not documented within GATK)
##### ERROR GELITEXT GeliTextFeature (this is an external codec and is not documented within GATK)
##### ERROR RAWHAPMAP RawHapMapFeature (this is an external codec and is not documented within GATK)
##### ERROR REFSEQ RefSeqFeature (this is an external codec and is not documented within GATK)
##### ERROR SAMPILEUP SAMPileupFeature (this is an external codec and is not documented within GATK)
##### ERROR SAMREAD SAMReadFeature (this is an external codec and is not documented within GATK)
##### ERROR TABLE TableFeature (this is an external codec and is not documented within GATK)
##### ERROR VCF VariantContext (this is an external codec and is not documented within GATK)
##### ERROR VCF3 VariantContext (this is an external codec and is not documented within GATK)
##### ERROR ------------------------------------------------------------------------------------------
I have reviewed many pages discussing the same error, but they couldn't help me to solve this problem.
I was wondering if anyone could help me how to figure out this problem. Please just keep it simple. I am not an expert.
Any idea and suggestions would be appreciated.
Thanks.
Hello,
GATK cannot determine what type of file is your
no_headerchr19.vcf
. I guess it will try to look in the header of your vcf file to see that it is an VCF file. You filename implied that you have no header in the file. If so the solution is to add a valid vcf header.fin swimmer
actually, I downloaded my VCF file from this link: common_all_20170710.vcf.gz
then I extracted chromosome 19 variants with this command:
but when I use this vcf file with the following command, I get this error:
therefore I removed the header with this command:
The error message tells you that the "CHROM header line" is missing. This line looks like this
Until now I don't understand why your grep command doesn't fetch this line. Without
-w
is does.Instead of using
grep
try using tabix to extract your region. This is much faster.You first need to index the database.
Than you can query this way:
fin swimmer
Are you sure that there is a "chr19".
common_all_20170710.vcf.gz
looks like dbSNP. There the chromosome name has no "chr" at the beginning. It's just "19".thanks, fin! I again extracted chr19 variants from the reference vcf file by your recommended command:
but when I executed the base recalibration command, I got this error:
I tried to change the header file with this command:
this is the header:
but I still get the same error:
Hello mahdikhadem9,
why you are writing this post three times?
Before renaming any contig you should go through all the needed file and make sure how the naming convention ist there.
The reference file:
You bam file:
The variant file from which you extract the variants:
As you can see in my case the
reference file
and thebam
file uses chromome names with chr and thevcf
file without. The easiest way to get all in sync is here to modify the vcf file after extracting all variants of chromosom 19 and prepend a chr to each line:If your files are already in sync, than you don't have to modify anything.
fin swimmer
Hi fin swimmer
I apologize for sending this post three times. I had a problem with my internet connection which made me think that I couldn't send it at first.
These are the headers of my files:
As you see, my VCF file header is chr19 and my reference sequence file header is 19. so I changed the header with the following command:
Now the headers of my VCF and reference sequence files are the same (chr19). But when I use the following command, I get the same error as before.
thank you again fin swimmer. I highly appreciate your consideration and the time you devote.
It is easier to modify the
vcf
, otherwise you have to modify theref
and yourbam
file. To get rid of the "chr" use this:But three more general things:
vcf
file you use is also from this reference genome before continue.bam
file only contains the chromosome 19 I guess it isn't necessary to subset the initialvcf
file.fin swimmer
The problem solved! as you proposed, I removed "chr" tag from VCF file and the code executed without any problem afterward.
Thank you for your help "fin swimmer"
thanks fin! I again extracted chr19 variants from the reference vcf file by your recommended command:
but when I executed the base recalibration command, I got this error:
I tried to change the header file with this command:
this is the header:
but I still get the same error:
thanks, fin! I again extracted chr19 variants from the reference vcf file by your recommended command:
but when I executed the base recalibration command, I got this error:
I tried to change the header file with this command:
this is the header:
but I still get the same error: