Bed to Interval list for CollectHsMetrics picard
1
0
Entering edit mode
7 months ago
aroso491 • 0

Hello everyone,

I am trying to run picard CollectHsMetrics to obtain the on-target percentage of some .bam files that I have processed with BS-SNPer. I have a .bed file with the targets and I am trying to use my bed file with the target regions to produce the interval list, which I will end up using as BAIT and TARGET parameters in CollectHsMetrics.

What I have done is using picard CreateSequenceDictionary to create a reference dictionary so that I can then produce the interval files with this command:

picard CreateSequenceDictionary R=/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa O=reference.dict


This created a reference.dict file whose header you can see here:

(base) arodriguez@nucleus:/store2/arodriguez/PN0193_BSSNPer/last_run$head /store2/arodriguez/PN0193_BSSNPer/last_run/reference.dict @HD VN:1.6 @SQ SN:1 LN:249250621 M5:1b22b98cdeb4a9304cb5d48026a85128 UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa @SQ SN:2 LN:243199373 M5:a0d9851da00400dec1098a9255ac712e UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa @SQ SN:3 LN:198022430 M5:fdfd811849cc2fadebc929bb925902e5 UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa @SQ SN:4 LN:191154276 M5:23dccd106897542ad87d2765d28a19a1 UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa @SQ SN:5 LN:180915260 M5:0740173db9ffd264d728f32784845cd7 UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa @SQ SN:6 LN:171115067 M5:1d3a93a248d92a729ee764823acbbc6b UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa @SQ SN:7 LN:159138663 M5:618366e953d6aaad97dbe4777c29375e UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa @SQ SN:8 LN:146364022 M5:96f514a9929e410c6651697bded59aec UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa @SQ SN:9 LN:141213431 M5:3e273117f15e0a400f01055d9f393768 UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa  Aside from this, my bed file looks like this when opened: (base) arodriguez@nucleus:/store2/arodriguez/PN0193_BSSNPer/last_run$ head /store2/arodriguez/git_environment/Das_EPI_hg19_22May2019_capture_targets.bed
chr1    28710   29840   chr1:28735-29810
chr1    135096  135272  chr1:135124-135563
chr1    135347  135588  chr1:135124-135563
chr1    327764  328012  chr1:327790-328229
chr1    437121  437720  chr1:437151-438164
chr1    544712  544811  chr1:544738-546649
chr1    546073  546680  chr1:544738-546649
chr1    713967  714377  chr1:713984-714547
chr1    714413  714578  chr1:713984-714547
chr1    742819  743373  chr1:742846-743356


Because I don't have a bait file I am using my bed file with the target regions to produce the interval list, and to do it I am executing this command:

picard BedToIntervalList I=/store2/arodriguez/git_environment/Das_EPI_hg19_22May2019_capture_targets.bed SD=reference.dict O=target.interval_list


But I am getting an error when executing this that I am being unable to understand - I am also quite new in this plus I have an amazingly tight deadline which doesn't help - and I was in the hopes that you could help me with it.

(base) arodriguez@nucleus:/store2/arodriguez/PN0193_BSSNPer/last_run\$ picard BedToIntervalList I=/store2/arodriguez/git_environment/Das_EPI_hg19_22May2019_capture_targets.bed SD=reference.dict O=target.interval_list
INFO    2022-02-16 11:09:57     BedToIntervalList

********** NOTE: Picard's command line syntax is changing.
**********
**********
********** The command line looks like this in the new syntax:
**********
**********    BedToIntervalList -I /store2/arodriguez/git_environment/Das_EPI_hg19_22May2019_capture_targets.bed -SD reference.dict -O target.interval_list
**********

11:09:57.691 WARN  LegacyCommandLineArgumentParser - Hidden annotation is not honored for --DROP_MISSING_CONTIGS
[Wed Feb 16 11:09:57 GMT 2022] BedToIntervalList INPUT=/store2/arodriguez/git_environment/Das_EPI_hg19_22May2019_capture_targets.bed SEQUENCE_DICTIONARY=reference.dict OUTPUT=target.interval_list    SORT=true UNIQUE=false DROP_MISSING_CONTIGS=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Feb 16 11:09:57 GMT 2022] Executing as arodriguez@nucleus on Linux 4.15.0-163-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_152-release-1056-b12; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.26.3
[Wed Feb 16 11:09:57 GMT 2022] picard.util.BedToIntervalList done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=514850816
at picard.util.BedToIntervalList.doWork(BedToIntervalList.java:156)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)


I see that apparently my sequence dictionary "should" have something that references chr1 which makes me think that maybe the issue is got to do something with formatting issues but I really don't know how else could .bam, .bed and sequence dictionary files look like and I am very confused about how to fix it.

Thank you for any help that you could lend me.

Alejandra

0
Entering edit mode

Hi,

I noticed I got the same issues. How did you solve it? Thanks

0
Entering edit mode
7 months ago

the error message is quite clear.

the data starts with chr1 ...

while in the dictionary it's 1 not chr1...

@SQ     SN:1    LN:249250621    M5:1b22b98cdeb4a9304cb5d48026a85128     UR:file:/store2/arodriguez/git_environment/softlinks2/work/stage/2e/37a6fe25f4a0bda1f8354b56fa39e9/genome.fa

0
Entering edit mode

Should I change it manually then? Is that a possibility? Maybe I can change the bed file, is that a better idea than changing the dictionary?

0
Entering edit mode

Hi, were you able to solve this? Have you changed the reference file and align the samples all over?

any help is appreciated thanks

0
Entering edit mode

Hi! I solved this issue by changing the reference file and it did work for me in the end - it seems like the .bed file was playing up, so I just re-downloaded it from my repository, ran everything again and it all worked for me. Sorry I couldn't be of more help!

0
Entering edit mode

can you share the reference? i have used ucsc and hg19 and still gives me contigs error.

0
Entering edit mode

Hi, sorry, I will try and share it - unless you've sorted the issue already?