Question: understanding wgsim flags
1
4.6 years ago by
kshitijtayal40
India
kshitijtayal40 wrote:

I am listing below various flags that are used by wgsim. Wgsim simulate sequence reads from a reference genome .

-e FLOAT base error rate [0.020]

-d INT outer distance between the two ends [500]

-s INT standard deviation [50]

-N INT number of read pairs [1000000]

-1 INT length of the first read [70]

-2 INT length of the second read [70]

-r FLOAT rate of mutations [0.0010]

-R FLOAT fraction of indels [0.15]

-X FLOAT probability an indel is extended [0.30]

-S INT seed for random generator [-1]

-A FLOAT disgard if the fraction of ambiguous bases higher than FLOAT [0.05]

-h haplotype mode

What do you understand by standard deviation in respect to generating reads.? Similarly what is the difference between rate of mutation and base error rate.? How to calculate outer distance between two ends of a read. These are some of the question that troubles me..

sequencing alignment • 2.1k views
ADD COMMENTlink
modified 4.6 years ago by Devon Ryan95k • written 4.6 years ago by kshitijtayal40
0
4.6 years ago by
Devon Ryan95k
Freiburg, Germany
Devon Ryan95k wrote:

Given its position directly after the fragment length (-d), I would take that to be the standard deviation of the fragment length. After all, real data doesn't come from fragments of the same size, but from a distribution with some mean length (-d) and variation (-s).

ADD COMMENTlink written 4.6 years ago by Devon Ryan95k

i don't know about the real data but synthetic reads do come with fixed read length. Assuming read length and fragment length are the same , again my question would be what is the role of standard deviation in generating synthetic reads ?

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by kshitijtayal40

Read length and fragment length aren't related.

Edit: I guess you don't know what a fragment is. What you sequence are fragments (short stretches) of DNA. What you produce by sequencing these are reads. The fragments you sequence and/or generate have some size range, which these parameters dictate. Yes, the read lengths are constant, since this simulates illumina-style sequencing experiments.

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Devon Ryan95k
Please log in to add an answer.

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1981 users visited in the last hour