Question

Exome alignment and preprocessing: When I perform an exome alignment, should I use a Ref Genome or Ref Exome .fasta?

1

Entering edit mode

10 months ago

javiflaja ▴ 50

I'm learning the basis of preprocessing, and I can't find anywhere a source that would tell me what's the difference between preprocessing a genome for vc and an exome for vc. Do I use a ref genome? In that case, is there any extra step/s to implement?

I was mentioned somewhere that I might need a padded BED file (or some BED file) containing genomic coordinates for exonic regions if using a ref genome.

Extrapolating from genome preprocessing pipeline, I know have to:

Obtain a ref
bwa index the ref
FastQC the samples
bwa mem alignment samples onto ref (maybe I add the mythical BED in this command?)
Obtain mapstats
Convert SAM to BAM
Sort exome BAM
MarkDup with Picard
Create .dict for ref and knownsites in order to recalibrate and apply BQSR
Recalibrate and apply BQSR

Am I missing any step for Exome vc (before Haplotype caller ofc). Any feedback will be highly appreciated!

vcf pipeline exome-alignment bed • 1.7k views

ADD COMMENT • link updated 10 months ago by Jeremy Leipzig 22k • written 10 months ago by javiflaja ▴ 50

1

Entering edit mode

align to the genome so you don't force an alignment. Maybe consider using or studying the WARP or Sarek variant calling pipelines.

ADD REPLY • link 10 months ago by Jeremy Leipzig 22k

0

Entering edit mode

isn't there a risk of alignments to pseudo-genes in non-transcriptive regions? So your advice is not to use the BED file during alignment?

ADD REPLY • link 10 months ago by javiflaja ▴ 50

1

Entering edit mode

So your advice is not to use the BED file during alignment?

you should use it, but the BED file has nothing to do with the alignment itself - that is a downstream step you can use for QC or masking calls outside your regions of interest

isn't there a risk of alignments to pseudo-genes in non-transcriptive regions?

maybe. but if a read maps to multiple locations equally well, a typical aligner will assign it randomly. you will you still have some coverage if there are pseudogenes that manage to attract alignments

ADD REPLY • link 10 months ago by Jeremy Leipzig 22k

0

Entering edit mode

Thank you! Could I ask you to specify where in my pipeline I apply this downstream step for QC or masking calls?

ADD REPLY • link 10 months ago by javiflaja ▴ 50

1

Entering edit mode

I would study the WARP github repo where they use calling_interval_list, evaluation_interval_list, target_interval_list, bait_interval_list

ADD REPLY • link 10 months ago by Jeremy Leipzig 22k

1

Entering edit mode

you the reference genome. because: Exome Sequencing: Masking The Non-Genic Sequences ? ; Why Shouldn'T I Use Masking When Doing A Reference Alignment? ;

ADD REPLY • link 10 months ago by Pierre Lindenbaum 161k

1

Entering edit mode

Your answer got cut, I think your info is important, do you mind completing the previous part of your response? Thanks a lot!

ADD REPLY • link 10 months ago by javiflaja ▴ 50

score 2 · Answer 1 · 2023-06-21

2

Entering edit mode

10 months ago

amy__ ▴ 160

If you have WES, you will need the bed file which will have been used by the sequencing company to specify the regions which should be sequenced. For WGS you won't have this bed file. The bed file comes in handy when you get to the variant caller stage or for determining coverage over that region. The bed files are usually available online but you need to make sure it was the same one that was used for sequencing.

ADD COMMENT • link 10 months ago by amy__ ▴ 160

0

Entering edit mode

So I'm not going to submit the BED during the alignment command? Would you mind specifying which step requires this BED? pileup (coverage) or haplotypecaller? thanks for the help!

ADD REPLY • link 10 months ago by javiflaja ▴ 50

2

Entering edit mode

Hey, so I've not used the GATK best practices before but from my experience the bed file has come when I used qualimap to work out coverage over those regions and when using my variant callers - so I used deepvariant which requires the bed file as an input too

ADD REPLY • link 10 months ago by amy__ ▴ 160