We have a WES data set which was done using the Agilent Mouse exome capture library kit. I wanted to download the target file and got, similar to this post, a folder with several bed files (_AllTracks.bed, _Covered.bed, _Padded.bed, _Regions.bed and a file named Targets.txt). I am not really sure what they are, but my problem is more than that.

When I try to run the command

gatk BedToIntervalList \
-I input/S0276129_Covered.bed \
-O input/S0276129_Covered.intervals \
--SEQUENCE_DICTIONARY ../reference/mm10/mm10.dict

I get the following error:

picard.PicardException: Start on sequence 'chr1' was past the end: 195471971 < 196469947
        at picard.util.BedToIntervalList.doWork(
        at picard.cmdline.CommandLineProgram.instanceMain(
        at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(
        at org.broadinstitute.hellbender.Main.mainEntry(
        at org.broadinstitute.hellbender.Main.main(

Which based on the message tells me that the bed files show coordinates which are not given in the dict file for chr1.

This is true, when I look at chromosome 1 in the bed file I see:

 grep "chr1\s" input/S0276129_Covered.bed |  tail
chr1    196986946       196987186       entg|Cr2,ens|ENSMUST00000082321,ref|NM_007...
chr1    196989335       196989485       entg|Cr2,ens|ENSMUST00000082321,ref|NM_007...

but t he dict file shows

less ../reference/mm10/Sequence/WholeGenomeFasta/genome.dict 
@HD     VN:1.0  SO:unsorted
@SQ     SN:chr1 LN:195471971    UR:file:/illumina/scratch/iGenomes/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa   M5:c4ec915e7348d42648eefc1534b71c99

When I search for the gene Cr2, its coordinates are Chromosome 1: 195,136,811-195,176,716

Is there something wrong with the bed file from Agilent? Any ideas what is happening?


