Question: Error in converting bed file to interval using Picard
0
gravatar for Assa Yeroslaviz
11 months ago by
Assa Yeroslaviz1.4k
Munich
Assa Yeroslaviz1.4k wrote:

We have a WES data set which was done using the Agilent Mouse exome capture library kit. I wanted to download the target file and got, similar to this post, a folder with several bed files (_AllTracks.bed, _Covered.bed, _Padded.bed, _Regions.bed and a file named Targets.txt). I am not really sure what they are, but my problem is more than that.

When I try to run the command

gatk BedToIntervalList \
-I input/S0276129_Covered.bed \
-O input/S0276129_Covered.intervals \
--SEQUENCE_DICTIONARY ../reference/mm10/mm10.dict

I get the following error:

picard.PicardException: Start on sequence 'chr1' was past the end: 195471971 < 196469947
        at picard.util.BedToIntervalList.doWork(BedToIntervalList.java:143)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
        at org.broadinstitute.hellbender.Main.main(Main.java:292)

Which based on the message tells me that the bed files show coordinates which are not given in the dict file for chr1.

This is true, when I look at chromosome 1 in the bed file I see:

 grep "chr1\s" input/S0276129_Covered.bed |  tail
chr1    196986946       196987186       entg|Cr2,ens|ENSMUST00000082321,ref|NM_007...
chr1    196989335       196989485       entg|Cr2,ens|ENSMUST00000082321,ref|NM_007...

but t he dict file shows

less ../reference/mm10/Sequence/WholeGenomeFasta/genome.dict 
@HD     VN:1.0  SO:unsorted
...
@SQ     SN:chr1 LN:195471971    UR:file:/illumina/scratch/iGenomes/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa   M5:c4ec915e7348d42648eefc1534b71c99
...

When I search for the gene Cr2, its coordinates are Chromosome 1: 195,136,811-195,176,716

Is there something wrong with the bed file from Agilent? Any ideas what is happening?

thanks

ADD COMMENTlink written 11 months ago by Assa Yeroslaviz1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 885 users visited in the last hour