Customisable DNA Read Simulation for Deleted Reads
1
1
Entering edit mode
2.2 years ago
chris.fearn ▴ 10

Dear all,

I am currently trying to test a pipeline I am working on and would like to be able to check its sensitivity. The pipeline revolves around detecting large scale deletions. To validate it I would like to use an artificial/simulated data set wither factors that I can control.

I would like to make some read data (preferably in fastq format) from a reference sequence with a singular large deletion and even coverage (except in the deleted region), however I would like to be able to control the ratio of deleted reads: wild type reads eg 50:50 25:75 10:90 etc. I would also like to be able to customise the length and error profiles of these reads whilst still having the same mutational profile.

I am unsure how to go about this though, I have tried using DWGsim but have been unable to customise the number of deleted reads for a deletion that I have tried to simulate using an input VCF with the reference sequence. Are there any tools people would recommend (and if there are how could I use them specifically for this?) or ways in which I could use a tool like DWGsim to achieve this goal?

Many thanks in advance!

SV Deletions Simulation DNA • 609 views
ADD COMMENT
0
Entering edit mode

Naively, you could just run DWGsim twice, couldn't you? Run once on the reference genome and once on a "new" reference genome with the deletion (that you introduce prior to simulation). Then, if you want 10% reads from the deletion, just subsample from the WT genome at 90% and the Deleted genome 10% and concatenate the results.

ADD REPLY
1
Entering edit mode
2.2 years ago
trausch ★ 1.9k

Visor should be able to simulate such data sets (doi: 10.1093/bioinformatics/btz719).

ADD COMMENT

Login before adding your answer.

Traffic: 3161 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6