How To Simulate Whole-Exome Reads
0
1
Entering edit mode
11.1 years ago
mpallocc ▴ 10

Hello everybody,

I am currently trying to create a dataset of simulated exome reads (with simulated base qualities as well). I am currently using simNGS from EBI to create a fragment library and create reads. The main issue with this one is that with a workflow like:

fasta genome reference -> fragment library -> simulated reads.

We have simulated genomic reads instead of exomic reads. A workflow such as:

fasta exome reference -> fragment library -> simulated reads

doesnt’ sound so good, because it leaves out lots of exome-specific behaviours (e.g, off target reads).

Is there any other approach to follow in order to obtain a reliable whole-exome simulated read dataset?

simulation • 3.8k views
ADD COMMENT
0
Entering edit mode

For what reason are you doing your simulation? Could you add to your question details about what behaviors you want to capture and why?

ADD REPLY
0
Entering edit mode

I'm interested in whole-exome sequencing variant call behaviour, specifically how base quality distortion affect variant call (on SNP level). I'm focusing on Illumina platforms.

ADD REPLY
1
Entering edit mode

For the purposes of variant calling behavior, do you need to model things like off-target reads and variable coverage since you are interested in base quality distortion? I know you want to, but do you need to?

ADD REPLY
0
Entering edit mode

the main point is that we use a complete whole-exome variant call pipeline (including the mapping step). We'd like to have a more realistic simulated read dataset to avoid additional bias.. but if there's no way we'll stick to what we have.

ADD REPLY
0
Entering edit mode

In the case you are interested, you would probably want to simulate your (hybrid) selection process. I have no idea how effectively hybridization characteristics of a genomic library versus a pool of several hundred thousand oligos can be captured, both from a computational and from a biochemical view. The value of doing so is also unknown, at least to me.

ADD REPLY

Login before adding your answer.

Traffic: 2352 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6