Using .Bed-Files To Improve Ngs Pipeline?
Entering edit mode
10.7 years ago
Oliver ▴ 240

Hello folks,

I followed this pipeline to process some single read NGS data:

Since we used the Agilent SureSelect Kit we have a .bed-file which comprises the regions of interest. Now I thought about reducing the effort by using this file. Is it useful to skip the 6th step and insert the .bed-file instead?

java -jar /bin/GTK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /seq/REFERENCE/human_18.fasta -I /output/FOO.sorted.bam -o /output/FOO.intervals

Are there any further improvements possible with this file?

Any help appreciated, Oliver

next-gen sequencing bed • 3.6k views
Entering edit mode
10.7 years ago

Aligning to the whole genome, as is sounds like you've done, is the right approach to take. This related question discusses whether it is useful to subset the genome for alignment; you want to avoid subsetting to prevent spurious alignments.

Specific to your question, your capture targets are not useful for speeding up the realignment step. Step 6 is finding potential regions of mis-alignment based on the full set of reads, which is a separate problem.

My suggestion for using your BED file would be to evaluate the effectiveness of Hybrid Selection. Picard has a commandline program for this:

which provides a number of useful metrics to assess how well the hybrid selection worked:

After finishing SNP calling, you can also subset your calls to your regions of interest with BEDtools:

intersectBed -a snp_calls.vcf -b target_regions.bed -wa > filtered_snp_calls.vcf

for downstream analyses.

Entering edit mode

Thank you for your answer, Brad. Can you tell me more about step 6? This is not related to my inital question anymore but I want to understand why step 6 is exactly happening. How does he determine misalignments? (My thought is, when he has to find misaligments why didn't he align it right in the first step?)

Entering edit mode

Oliver, glad that it helped. The realignment step has the advantage of looking at multiple reads aligned to the same position, while the initial alignment only considers one read at a time. As a result the realignment can identify regions where indels can be adjusted to avoid mismatches. The GATK wiki has a good description of the approach:


Login before adding your answer.

Traffic: 2943 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6