Question: Bait/Target files for Picard HsMetrics (Exome Sequencing)
1
gravatar for rafamoura1987bio
5.2 years ago by
Brazil
rafamoura1987bio10 wrote:

Hi,

I used the expanded exome kit (nextera) for my samples. Now I want to use picard/HsMetrics, which asks for a baits and targets files.

On the Illumina website, I can find the targets bed file for the kit. I also find the targeted_regions file.

Are these the correct files for picard HsMetrics? When I use them, I get strange values for efficiency (like >2).

Maybe the targets file is some sort of "expanded exome definition"? if so, how do i find this definition for "expanded exome / illumina"? if the targets is this "exome definition", what is the the baits file? (would it be the targets file provided by illumine?)

Sorry for the naive question, thank you

sequencing picard exome • 5.3k views
ADD COMMENTlink modified 3.9 years ago by Jeremy Leipzig19k • written 5.2 years ago by rafamoura1987bio10
0
gravatar for Zaag
5.2 years ago by
Zaag800
Amsterdam
Zaag800 wrote:

From what I understand you get a Targeted Regions Manifest and Exome Probe Manifest, so use the regions from the Probe Manifest to create the Tiled region file for picard and the Targeted one for the other.

http://support.illumina.com/downloads/nextera-rapid-capture-expanded-exome-product-files.html

You could also make your own target file with all coding exons (from UCSC) or something else..

ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by Zaag800
1

The problem with creating your own target file is that none of the exome capture kits capture all known/annotated exons. So you will throw off your on/off target statistics really badly if you do this.

ADD REPLYlink written 5.2 years ago by Daniel Swan13k

Isn't that exactly what you want to know? I'm not interested in knowing how efficient is Illumina at targeting the regions they say they target. My interest is in knowing how efficient is Illumina in targeting a list of annotated exons I trust from source X.

For this reason, I personally prefer creating my own targeted file and use it with the probe manifest from Illumina.

ADD REPLYlink written 5.2 years ago by Carlos Borroto1.9k
1

No, you need to know how efficient your capture is, and how many exons you have failed to capture in the targeted region. Knowing that you haven't captured things that you weren't meant to capture isn't very useful in the grand scheme of things. You should have checked your targets of interest were covered in the kit before you ran your experiement.
 

ADD REPLYlink modified 5.2 years ago • written 5.2 years ago by Daniel Swan13k

I look at the on/off bait statistic for that. That tells you how efficient your capture is.

If you are able to cover your target of interest also depends on fragment and read size

ADD REPLYlink written 5.2 years ago by Zaag800

Hence my comment "So you will throw off your on/off target statistics really badly if you do this" ;) With Agilent SureSelect, the on/off bait and on/off target are generally for all intents and purposes identical...

ADD REPLYlink modified 5.2 years ago • written 5.2 years ago by Daniel Swan13k

Ok I didn't know that, we use Nimblegen and the proberegion sometimes is 3 times the target (for custom designs).

ADD REPLYlink written 5.2 years ago by Zaag800
0
gravatar for rafamoura1987bio
5.2 years ago by
Brazil
rafamoura1987bio10 wrote:

Hi @Zaag, you're correct: I get the targeted regions and probe manifests. Are you saying that the intervals file I can create from the "targeted regions" file should be used as "TARGET_INTERVALS" for Picard? Similarly, the intervals file created from the probe manifest should be used as "BAIT_INTERVALS" for Picard?

If so, I'm afraid I'm missing something because the "bait efficiency" field exceeds 100% (the probe manifest gives me something around 30Mb, while the targeted regions is around 60Mb... so, efficiency would be the ratio 60/30)....

I'd like to use the definition of exome as set by Illumina... It's not that I want to know "how efficient is Illumina at targeting the regions they say they target", but I want to know how far off my experiment is from such definition... (does this even make sense?)

I'm afraid my question may concern even simpler concepts: is "BAIT" (as defined by Picard) the same as "probe" (as defined by Illumina)?

ADD COMMENTlink written 5.2 years ago by rafamoura1987bio10

Yes that is what I'm saying. The efficiency is possible, if Illumina is able to target 60 Mb with 30 Mb of probes (remember the fragments are larger than the probes) they are very efficient.

 

And yes, bait = probe.

ADD REPLYlink written 5.2 years ago by Zaag800
0
gravatar for Jeremy Leipzig
3.9 years ago by
Philadelphia, PA
Jeremy Leipzig19k wrote:
cat nexterarapidcapture_exome_targetedregions_v1.2.bed | sed s/chr//g | sed s/M/MT/g > nexterarapidcapture_exome_targetedregions_v1.2.no_chr.MT.bed

cat NexteraRapidCapture_Exome_Probes_v1.2.txt | grep CEX | sed -e 's/chrM/chrMT/g;s/chr//g;' | cut -f2,3,4 > NexteraRapidCapture_Exome_Probes_v1.2.bed

picard BedToIntervalList I=annotations/NexteraRapidCapture_Exome_Probes_v1.2.bed O=annotations/NexteraRapidCapture_Exome_Probes_v1.2.interval_list SD=human_g1k_v37.dict 

picard BedToIntervalList I=annotations/nexterarapidcapture_exome_targetedregions_v1.2.no_chr.MT.bed O=annotations/nexterarapidcapture_exome_targetedregions_v1.2.no_chr.MT.interval_list SD=human_g1k_v37.dict 

rule hsMetrics:
    input:
        bam = config['process_dir'][freeze] + config['results']['recalibrated'] + "/{sample}.recal.la.bam",
        bam_probe_intervals = "annotations/NexteraRapidCapture_Exome_Probes_v1.2.interval_list",
        bam_target_intervals = "annotations/nexterarapidcapture_exome_targetedregions_v1.2.no_chr.MT.interval_list",
    output:
        hs = config['landing_dir'][freeze] + config['results']['hsmetrics'] + "/{sample}.hsmetrics",
    params:
        picard = config['jars']['picard']['path'],
        md = "CalculateHsMetrics",
        opts = config['tools']['opts']['med'] + ' ' + config['javatmpdir'],
        metrics = config['process_dir'][freeze] + config['results']['picard']
    log:
        config['datadirs']['log'] + "/{sample}.hsmetrics.log"
    shell:
        """
        {params.picard} {params.opts} \
        CalculateHsMetrics \
        BAIT_INTERVALS={input.bam_probe_intervals} \
        TARGET_INTERVALS={input.bam_target_intervals} \
        INPUT={input.bam} \
        OUTPUT={output.hs} \
        METRIC_ACCUMULATION_LEVEL=ALL_READS \
        QUIET=true  \
        VALIDATION_STRINGENCY=SILENT 2> {log}
        """
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Jeremy Leipzig19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 956 users visited in the last hour