Can You Do Some Recommendations For An Indel/Snp/Cnv Simulator?
2
1
Entering edit mode
12.3 years ago
Pascal ★ 1.5k

Hi

I'm writing a piece of software to simulate Structural Variations in Genome. So far, I have written a first version with a simple set of features:

  • accept a FASTA in input,
  • write the new genome into a new FASTA file and the variations into a VCF file,
  • simulate SNP,
  • simulate indels,
  • the frequency of SNP, Insertion and Deletions are configurable.

Before going ahead with new features, I would like to hear from your feedback and recommendations.

I thought that one of the possible improvements (to make it more "real") could be to be able to control the frequency of variants depending on the region of the chromosome: to generate for instance more variants in non-coding regions.

Thanks in adance for any input!

variant structural genome simulation • 3.5k views
ADD COMMENT
1
Entering edit mode
12.3 years ago

some trivial suggestions:

  • more SNPs in the introns, non-coding regions
  • more SNPs in the less conserved regions ( see the GERP score)
  • more synonymous snps in the exons.
ADD COMMENT
0
Entering edit mode

Thanks for your answer. To know if my REF base is whether in intron or exon I plan to use BED files generated from the USCS Table tool. Do you think it is a good idea?

ADD REPLY
0
Entering edit mode

BTW Pierre, in order to generate synonymous SNPs I have to read groups of 3 nucleotides from genome (then see if a base change will be synonymous or not) AND to detect also frame-shift. Is there anything else I should take care of if I decide to implement simulation of sSNP?

ADD REPLY
0
Entering edit mode

you could use the human codon frequencies ?

ADD REPLY
1
Entering edit mode
12.3 years ago
lh3 33k

To the best of my knowledge, no one knows how to simulate CNV/SV/indel to an acceptable accuracy. If you really want to get a good simulation, you can simulate reads from a de novo assembly and then get the positions of simulated reads from an assembly-to-assembly alignment. This procedure is quite complicated, but is closest to the truth.

ADD COMMENT
0
Entering edit mode

Thanks for your answer. It is interesting but for the time being I just can't invest too much time :-S I'm looking for a trade-off between complexity of the algo and getting closer to the reality.

ADD REPLY

Login before adding your answer.

Traffic: 1663 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6