Bait/Target files for Picard HsMetrics (Exome Sequencing)
3
1
Entering edit mode
8.7 years ago

Hi,

I used the expanded exome kit (nextera) for my samples. Now I want to use picard/HsMetrics, which asks for a baits and targets files.

On the Illumina website, I can find the targets bed file for the kit. I also find the targeted_regions file.

Are these the correct files for picard HsMetrics? When I use them, I get strange values for efficiency (like >2).

Maybe the targets file is some sort of "expanded exome definition"? if so, how do I find this definition for "expanded exome / illumina"? if the targets is this "exome definition", what is the the baits file? (would it be the targets file provided by illumine?)

Sorry for the naive question, thank you

picard exome-sequencing • 9.3k views
ADD COMMENT
0
Entering edit mode
8.7 years ago
Zaag ▴ 860

From what I understand you get a Targeted Regions Manifest and Exome Probe Manifest, so use the regions from the Probe Manifest to create the Tiled region file for picard and the Targeted one for the other.

http://support.illumina.com/downloads/nextera-rapid-capture-expanded-exome-product-files.html

You could also make your own target file with all coding exons (from UCSC) or something else.

ADD COMMENT
1
Entering edit mode

The problem with creating your own target file is that none of the exome capture kits capture all known/annotated exons. So you will throw off your on/off target statistics really badly if you do this.

ADD REPLY
0
Entering edit mode

Isn't that exactly what you want to know? I'm not interested in knowing how efficient is Illumina at targeting the regions they say they target. My interest is in knowing how efficient is Illumina in targeting a list of annotated exons I trust from source X.

For this reason, I personally prefer creating my own targeted file and use it with the probe manifest from Illumina.

ADD REPLY
1
Entering edit mode

No, you need to know how efficient your capture is, and how many exons you have failed to capture in the targeted region. Knowing that you haven't captured things that you weren't meant to capture isn't very useful in the grand scheme of things. You should have checked your targets of interest were covered in the kit before you ran your experiment.

ADD REPLY
0
Entering edit mode

I look at the on/off bait statistic for that. That tells you how efficient your capture is.

If you are able to cover your target of interest also depends on fragment and read size

ADD REPLY
0
Entering edit mode

Hence my comment "So you will throw off your on/off target statistics really badly if you do this" ;) With Agilent SureSelect, the on/off bait and on/off target are generally for all intents and purposes identical...

ADD REPLY
0
Entering edit mode

Ok I didn't know that, we use Nimblegen and the proberegion sometimes is 3 times the target (for custom designs).

ADD REPLY
0
Entering edit mode
8.7 years ago

Hi @Zaag, you're correct: I get the targeted regions and probe manifests. Are you saying that the intervals file I can create from the "targeted regions" file should be used as "TARGET_INTERVALS" for Picard? Similarly, the intervals file created from the probe manifest should be used as "BAIT_INTERVALS" for Picard?

If so, I'm afraid I'm missing something because the "bait efficiency" field exceeds 100% (the probe manifest gives me something around 30Mb, while the targeted regions is around 60Mb... so, efficiency would be the ratio 60/30)....

I'd like to use the definition of exome as set by Illumina... It's not that I want to know "how efficient is Illumina at targeting the regions they say they target", but I want to know how far off my experiment is from such definition... (does this even make sense?)

I'm afraid my question may concern even simpler concepts: is "BAIT" (as defined by Picard) the same as "probe" (as defined by Illumina)?

ADD COMMENT
0
Entering edit mode

Yes that is what I'm saying. The efficiency is possible, if Illumina is able to target 60 Mb with 30 Mb of probes (remember the fragments are larger than the probes) they are very efficient.

And yes, bait = probe.

ADD REPLY
0
Entering edit mode
7.4 years ago
cat nexterarapidcapture_exome_targetedregions_v1.2.bed | sed s/chr//g | sed s/M/MT/g > nexterarapidcapture_exome_targetedregions_v1.2.no_chr.MT.bed
cat NexteraRapidCapture_Exome_Probes_v1.2.txt | grep CEX | sed -e 's/chrM/chrMT/g;s/chr//g;' | cut -f2,3,4 > NexteraRapidCapture_Exome_Probes_v1.2.bed
picard BedToIntervalList I=annotations/NexteraRapidCapture_Exome_Probes_v1.2.bed O=annotations/NexteraRapidCapture_Exome_Probes_v1.2.interval_list SD=human_g1k_v37.dict 
picard BedToIntervalList I=annotations/nexterarapidcapture_exome_targetedregions_v1.2.no_chr.MT.bed O=annotations/nexterarapidcapture_exome_targetedregions_v1.2.no_chr.MT.interval_list SD=human_g1k_v37.dict
rule hsMetrics:
    input:
        bam = config['process_dir'][freeze] + config['results']['recalibrated'] + "/{sample}.recal.la.bam",
        bam_probe_intervals = "annotations/NexteraRapidCapture_Exome_Probes_v1.2.interval_list",
        bam_target_intervals = "annotations/nexterarapidcapture_exome_targetedregions_v1.2.no_chr.MT.interval_list",
    output:
        hs = config['landing_dir'][freeze] + config['results']['hsmetrics'] + "/{sample}.hsmetrics",
    params:
        picard = config['jars']['picard']['path'],
        md = "CalculateHsMetrics",
        opts = config['tools']['opts']['med'] + ' ' + config['javatmpdir'],
        metrics = config['process_dir'][freeze] + config['results']['picard']
    log:
        config['datadirs']['log'] + "/{sample}.hsmetrics.log"
    shell:
        """
        {params.picard} {params.opts} \
        CalculateHsMetrics \
        BAIT_INTERVALS={input.bam_probe_intervals} \
        TARGET_INTERVALS={input.bam_target_intervals} \
        INPUT={input.bam} \
        OUTPUT={output.hs} \
        METRIC_ACCUMULATION_LEVEL=ALL_READS \
        QUIET=true  \
        VALIDATION_STRINGENCY=SILENT 2> {log}
        """
ADD COMMENT

Login before adding your answer.

Traffic: 1976 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6