Question

what's the difference between mapping PE fastqs to genome and to specified target region?

0

Entering edit mode

8.1 years ago

winter_li ▴ 60

Hi,

I built a new reference with a gene sequence.

Step 1: Using PE fastqs to map complete genome (hg19)

Step 2: Using PE fastqs to map part genome sequence ,eg: gene region

What 's the difference between Step 1 and Step 2 ? I wanna use bwa mem way.

Best

genome sequence alignment gene • 1.4k views

ADD COMMENT • link updated 8.1 years ago by Manu Prestat 4.1k • written 8.1 years ago by winter_li ▴ 60

score 1 · Answer 1 · 2016-03-30

It depends if your gene has a pseudogene - in that case mapping to human genome is more correct, otherwise your all reads (for gene and pseudogene) will be mapped to your specified region. If your next step include variant detection, step2 will produce a lot of mistakes (false positives). It also depends how much pseudogene differ from gene - if the difference is small, also mapping to human genome can be wrong. Best, Agata

score 0 · Answer 2 · 2016-03-30

Option 1: whole genome mapping is obviously slower, but it is more accurate as the reads will map onto the most similar region it finds.

Option 2: a way faster, but may provide wrong results (i.e. reads of a sequence originated from an other part of the genome).

It really depends on how was designed the sample preparation step, but if you can afford the whole genome mapping in terms of duration, I would recommend this option strongly to avoid false positive findings.