Is there a way to simulate short reads with CNV indel (1-50kb)?
I've read wgsim manual for instance but it looks to generate small indels only.
I second Michael. Produce an "altered" genome and use wgsim from there. However, to make it realistic, you will have to make two copies of each chromosome and introduce a CNV in one of them(*). Do NOT enter random sequences, but, if you need to make an amplification/duplication enter a sequence copied from somewhere else, so that you will be able to find wich regions have been duplicated.
When I did some simulations, I found that version 2.6 is better than 3.0 as 3.0 seems to have some sort of "chromosome specific" bias. I was gettin uneven coverage...
(*) be careful though. wgsim will produce mutation and small indels from each crhomosome, so th frequency of them will be twice as much (because you have twice as many chromosomes) but each mutation will be either heterozygous or, apparently, in 25% of your reads (ass opposed to 100% or 50%)
A read simulator with this feature is not required (maybe it exists anyway, but who cares?). You can simply modify the input sequence, the reference genome. Draw N (not much more than 1 makes sense to me) random chromosome location (chrom., position), draw the desired indel length from e.g. Poisson distribution. Delete it from your fasta sequence, in case of insertion, insert random sequence at that point. Give this file as input to your read simulator. From your answer to my comments you can get the right parameters for a little script that will do it.
What do you want to do with it, btw? These variations will be very easy to detect by lack of coverage in the region anyway, given your coverage is high enough.