Bed to Interval list for CollectHsMetrics picard
1
0
Entering edit mode
2.2 years ago
aroso491 • 0

Hello everyone,

I am trying to run picard CollectHsMetrics to obtain the on-target percentage of some .bam files that I have processed with BS-SNPer. I have a .bed file with the targets and I am trying to use my bed file with the target regions to produce the interval list, which I will end up using as BAIT and TARGET parameters in CollectHsMetrics.

What I have done is using picard CreateSequenceDictionary to create a reference dictionary so that I can then produce the interval files with this command:

picard CreateSequenceDictionary R=/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa O=reference.dict

This created a reference.dict file whose header you can see here:

(base) arodriguez@nucleus:/store2/arodriguez/PN0193_BSSNPer/last_run$ head /store2/arodriguez/PN0193_BSSNPer/last_run/reference.dict
@HD     VN:1.6
@SQ     SN:1    LN:249250621    M5:1b22b98cdeb4a9304cb5d48026a85128     UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa
@SQ     SN:2    LN:243199373    M5:a0d9851da00400dec1098a9255ac712e     UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa
@SQ     SN:3    LN:198022430    M5:fdfd811849cc2fadebc929bb925902e5     UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa
@SQ     SN:4    LN:191154276    M5:23dccd106897542ad87d2765d28a19a1     UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa
@SQ     SN:5    LN:180915260    M5:0740173db9ffd264d728f32784845cd7     UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa
@SQ     SN:6    LN:171115067    M5:1d3a93a248d92a729ee764823acbbc6b     UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa
@SQ     SN:7    LN:159138663    M5:618366e953d6aaad97dbe4777c29375e     UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa
@SQ     SN:8    LN:146364022    M5:96f514a9929e410c6651697bded59aec     UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa
@SQ     SN:9    LN:141213431    M5:3e273117f15e0a400f01055d9f393768     UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa

Aside from this, my bed file looks like this when opened:

(base) arodriguez@nucleus:/store2/arodriguez/PN0193_BSSNPer/last_run$ head /store2/arodriguez/git_environment/Das_EPI_hg19_22May2019_capture_targets.bed
chr1    28710   29840   chr1:28735-29810
chr1    135096  135272  chr1:135124-135563
chr1    135347  135588  chr1:135124-135563
chr1    327764  328012  chr1:327790-328229
chr1    437121  437720  chr1:437151-438164
chr1    544712  544811  chr1:544738-546649
chr1    546073  546680  chr1:544738-546649
chr1    713967  714377  chr1:713984-714547
chr1    714413  714578  chr1:713984-714547
chr1    742819  743373  chr1:742846-743356

Because I don't have a bait file I am using my bed file with the target regions to produce the interval list, and to do it I am executing this command:

picard BedToIntervalList I=/store2/arodriguez/git_environment/Das_EPI_hg19_22May2019_capture_targets.bed SD=reference.dict O=target.interval_list

But I am getting an error when executing this that I am being unable to understand - I am also quite new in this plus I have an amazingly tight deadline which doesn't help - and I was in the hopes that you could help me with it.

(base) arodriguez@nucleus:/store2/arodriguez/PN0193_BSSNPer/last_run$ picard BedToIntervalList I=/store2/arodriguez/git_environment/Das_EPI_hg19_22May2019_capture_targets.bed SD=reference.dict O=target.interval_list
INFO    2022-02-16 11:09:57     BedToIntervalList

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    BedToIntervalList -I /store2/arodriguez/git_environment/Das_EPI_hg19_22May2019_capture_targets.bed -SD reference.dict -O target.interval_list
**********


11:09:57.691 WARN  LegacyCommandLineArgumentParser - Hidden annotation is not honored for --DROP_MISSING_CONTIGS
11:09:57.721 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/arodriguez/anaconda3/share/picard-2.26.3-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Feb 16 11:09:57 GMT 2022] BedToIntervalList INPUT=/store2/arodriguez/git_environment/Das_EPI_hg19_22May2019_capture_targets.bed SEQUENCE_DICTIONARY=reference.dict OUTPUT=target.interval_list    SORT=true UNIQUE=false DROP_MISSING_CONTIGS=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Feb 16 11:09:57 GMT 2022] Executing as arodriguez@nucleus on Linux 4.15.0-163-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_152-release-1056-b12; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.26.3
[Wed Feb 16 11:09:57 GMT 2022] picard.util.BedToIntervalList done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=514850816
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: Sequence 'chr1' was not found in the sequence dictionary
        at picard.util.BedToIntervalList.doWork(BedToIntervalList.java:156)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

I see that apparently my sequence dictionary "should" have something that references chr1 which makes me think that maybe the issue is got to do something with formatting issues but I really don't know how else could .bam, .bed and sequence dictionary files look like and I am very confused about how to fix it.

Thank you for any help that you could lend me.

Alejandra

collecthsmetrics bedtointervallist bisulfite picard • 2.4k views
ADD COMMENT
0
Entering edit mode

Hi,

I noticed I got the same issues. How did you solve it? Thanks

ADD REPLY
0
Entering edit mode
2.2 years ago

the error message is quite clear.

chr1' was not found in the sequence dictionary

the data starts with chr1 ...

while in the dictionary it's 1 not chr1...

@SQ     SN:1    LN:249250621    M5:1b22b98cdeb4a9304cb5d48026a85128     UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa
ADD COMMENT
0
Entering edit mode

Should I change it manually then? Is that a possibility? Maybe I can change the bed file, is that a better idea than changing the dictionary?

ADD REPLY
0
Entering edit mode

Hi, were you able to solve this? Have you changed the reference file and align the samples all over?

any help is appreciated thanks

ADD REPLY
0
Entering edit mode

Hi! I solved this issue by changing the reference file and it did work for me in the end - it seems like the .bed file was playing up, so I just re-downloaded it from my repository, ran everything again and it all worked for me. Sorry I couldn't be of more help!

ADD REPLY
0
Entering edit mode

can you share the reference? i have used ucsc and hg19 and still gives me contigs error.

ADD REPLY
0
Entering edit mode

Hi, sorry, I will try and share it - unless you've sorted the issue already?

ADD REPLY

Login before adding your answer.

Traffic: 2630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6