Question

Tools to simulate Illumina short read sequences and ONT long reads with a reference genome

0

Entering edit mode

1 day ago

PolenP • 0

Hi, I would like to ask if you can recommend me tools that I can use to simulate whole genome sequences using a reference genome which will also give me list of the variants just like wgsim?

I was able to use wgsim, but when I tried aligning the paired-reads, it's not aligning together.

reads=100000000

for i in $(seq 1 10); do
  base="sim_${i}"
  seed=$((100 + i))   # different seed for each run (arbitrary choice)
  echo "Running $base  (seed=$seed)..."
  "$wgsim" "$ref" "${base}.R1.fq" "${base}.R2.fq" -1 70 -2 70 -N $reads -S $seed -e 0.0001 > "${base}.out.log" 2>&1
  echo "$base finished (log: ${base}.out.log)"
done

of maybe I am using wgsim wrong? I hope you can help me. Thank you!

simulate reads bioinformatics short short-read • 2.5k views

ADD COMMENT • link updated 2 hours ago by GenoMax 154k • written 1 day ago by PolenP • 0

0

Entering edit mode

It's a little unclear what you mean by "I was able to use wgsim, but when I tried aligning the paired-reads, it's not aligning together."

ADD REPLY • link 21 hours ago by cmdcolin ★ 4.3k

0

Entering edit mode

Sorry about that. It's when I align the paired reads together like the left and righ, they should align with common sequence at some ends, making one longer consensus sequence. I was able to align the pairs with an actual short read pairs.

ADD REPLY • link 12 hours ago by PolenP • 0

0

Entering edit mode

I believe the pairs will only have common sequence (e.g. the pairs will "overlap") if the insert size is small. the wgsim program has a flag called -d ("outer distance between the two ends") which i think can adjust the insert size and might be able to be made smaller (default: 500) to make it produce some overlap but I haven't tested it myself

ADD REPLY • link 3 hours ago by cmdcolin ★ 4.3k

0

Entering edit mode

they should align with common sequence at some ends, making one longer consensus sequence.

You don't want to simulate reads like this. Good WGS libraries should not have reads that overlap in the middle because they will represent short inserts. Not what one wants in real life.

You want more data to cover a particular mutation intriduced by simulation by having more read pairs covering it, rather than having anoverlapping read pair covering that mutation.

ADD REPLY • link 2 hours ago by GenoMax 154k

score 3 · Answer 1 · 2025-09-30

3

Entering edit mode

1 day ago

colindaven 8.0k

The best one I've used for ONT reads so far is badread - https://github.com/rrwick/Badread

For illumina I've used and like insilicoseq - https://insilicoseq.readthedocs.io/en/latest/

ADD COMMENT • link 1 day ago by colindaven 8.0k

0

Entering edit mode

which will also give me list of the variants

Do either of these packages satisfy the requirement of generating known mutations?

ADD REPLY • link 1 day ago by GenoMax 154k

score 2 · Answer 2 · 2025-09-30

2

Entering edit mode

1 day ago

Mensur Dlakic ★ 30k

You may want to consider one of these packages:

ADD COMMENT • link 1 day ago by Mensur Dlakic ★ 30k

0

Entering edit mode

Do you know if ART is able to generate known mutations.

ADD REPLY • link 1 day ago by GenoMax 154k

0

Entering edit mode

I think only a fraction of sequencing errors can be specified. If a mix of a reference and a mutated genome is included, setting sequencing errors to 0 might give reads with fixed mutations.

ADD REPLY • link 1 day ago by Mensur Dlakic ★ 30k

score 0 · Answer 3 · 2025-09-29

0

Entering edit mode

1 day ago

GenoMax 154k

if you can recommend me tools

You can use randomreads from BBMap suite to generate short illumina reads with known mutations. A guide is available here: https://bbmap.org/tools/randomreads

ADD COMMENT • link 1 day ago by GenoMax 154k