Question

Using .Bed-Files To Improve Ngs Pipeline?

2

Entering edit mode

14.2 years ago

Oliver ▴ 240

Hello folks,

I followed this pipeline to process some single read NGS data:

http://biostar.stackexchange.com/questions/1269/what-is-the-best-pipeline-for-human-whole-exome-sequencing

Since we used the Agilent SureSelect Kit we have a .bed-file which comprises the regions of interest. Now I thought about reducing the effort by using this file. Is it useful to skip the 6th step and insert the .bed-file instead?

java -jar /bin/GTK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /seq/REFERENCE/human_18.fasta -I /output/FOO.sorted.bam -o /output/FOO.intervals

Are there any further improvements possible with this file?

Any help appreciated, Oliver

next-gen sequencing bed • 4.9k views

ADD COMMENT • link updated 14.2 years ago by Brad Chapman 9.7k • written 14.2 years ago by Oliver ▴ 240

Ram · Answer 1 · 2011-05-05

5

Entering edit mode

14.2 years ago

Brad Chapman 9.7k

Aligning to the whole genome, as is sounds like you've done, is the right approach to take. This related question discusses whether it is useful to subset the genome for alignment; you want to avoid subsetting to prevent spurious alignments.

Specific to your question, your capture targets are not useful for speeding up the realignment step. Step 6 is finding potential regions of mis-alignment based on the full set of reads, which is a separate problem.

My suggestion for using your BED file would be to evaluate the effectiveness of Hybrid Selection. Picard has a commandline program for this: http://picard.sourceforge.net/command-line-overview.shtml#CalculateHsMetrics

which provides a number of useful metrics to assess how well the hybrid selection worked: http://picard.sourceforge.net/picard-metric-definitions.shtml#HsMetrics

After finishing SNP calling, you can also subset your calls to your regions of interest with BEDtools:

intersectBed -a snp_calls.vcf -b target_regions.bed -wa > filtered_snp_calls.vcf

for downstream analyses.

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 14.2 years ago by Brad Chapman 9.7k

0

Entering edit mode

Thank you for your answer, Brad. Can you tell me more about step 6? This is not related to my inital question anymore but I want to understand why step 6 is exactly happening. How does he determine misalignments? (My thought is, when he has to find misaligments why didn't he align it right in the first step?)

ADD REPLY • link 14.2 years ago by Oliver ▴ 240

0

Entering edit mode

Oliver, glad that it helped. The realignment step has the advantage of looking at multiple reads aligned to the same position, while the initial alignment only considers one read at a time. As a result the realignment can identify regions where indels can be adjusted to avoid mismatches. The GATK wiki has a good description of the approach: http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels

ADD REPLY • link 14.2 years ago by Brad Chapman 9.7k