Question: What Is The Best Way To Simulate Reads From Reference Transcriptome With Certain Error Rate?
4
gravatar for Geparada
8.0 years ago by
Geparada1.4k
Cambridge
Geparada1.4k wrote:

Hi!

I need to test the capability of some mappers to align reads with different error rates (mismatch and indels). That's why I want to simulate pools of reads with different error rates from a reference transcriptome. Do you know some tool wich can help me?

Thanks !!

ADD COMMENTlink modified 7.9 years ago by 2184687-1231-83-4.9k • written 8.0 years ago by Geparada1.4k
4
gravatar for brentp
8.0 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

If, as you suggest, you have a reference transcriptome, then it's no different than doing a whole genome sequence simulation. Try DNAA's wgsim with your transcriptome as the input reference.

ADD COMMENTlink written 8.0 years ago by brentp23k
1

Actually, depending on why you are doing this. My answer may be less than helpful. if you want to simulate different transcript and gene frequencies, the wgsim won't do much for you. It'll just sample from (with your chosen error rate) what's there for the transcriptome.

ADD REPLYlink written 8.0 years ago by brentp23k

Thanks brentp, think I'll try it

ADD REPLYlink written 7.9 years ago by Geparada1.4k
4
gravatar for 2184687-1231-83-
7.9 years ago by
2184687-1231-83-4.9k wrote:

For Illumina RNA-seq data, I've used simLibrary and simNGS (http://www.ebi.ac.uk/goldman-srv/simNGS/) in the past to simulate transcriptome reads. The following commands will simulate a FRTseq (--bias 1) library construction of average insert size 300 (--readlen 300) of a transcript CDS sequence, producing the input to be sequenced. The gel_cut option makes sure you only sequence above 200bp and below 1000bp. In the transcript CDS sequence you can add UTRs or introns if you want:

[?]

After that, I rename the sequences for tracking (scripts available here http://github.com/avilella/hashbrown) and run them through simNGS for 125 cycles, paired-end, producing the output fastq files.

[?]

The runfiles contain information about the intensity values given by a real machine for a real run. There are different example runfiles available in simNGS, both from real Illumina GA2 machines and Illumina HiSeq2000 machines. The runfiles have comments about when/where was the sequencing done, how well did it go, etc. If you want to simulate data as close as existing sequencing runs that you've already done in your facility, you can build your own runfiles using AYB against example .cif files from your own sequencer.

Hope it helps.

ADD COMMENTlink modified 7.9 years ago • written 7.9 years ago by 2184687-1231-83-4.9k

thanks for your guide avilella!

ADD REPLYlink written 7.9 years ago by Geparada1.4k
2
gravatar for Benm
7.9 years ago by
Benm710
Benm710 wrote:

I wrote a script for it, it can simulate mismatch, indels and also SVs, it was uploaded to SourceForge: http://sourceforge.net/projects/simulateseq/files/0.2.2

ADD COMMENTlink written 7.9 years ago by Benm710
1

Hi BENM. Nice to post a link to your script :) I have two suggestions. Take'em or leave'em, they are really just suggestions. All your code, including the comments and documentation, uses long lines and won't display well on terminals and even on the sourceforge page, in fact. You can think of line-wrapping it in order for it to display in a more readable way. I suggest 80 characters per line or less (78 displays well everywhere). Second suggestion, maybe you could update your website in the user area? Thanks again for the link. Cheers

ADD REPLYlink written 7.9 years ago by Eric Normandeau10k

the link says: "We are unable to display the page you requested".

ADD REPLYlink written 7.9 years ago by Geparada1.4k

nice script, thanks !

ADD REPLYlink written 7.9 years ago by Geparada1.4k

Thank you for your kind advice. It is too busy these days, I will update soon according to your suggestions.

ADD REPLYlink written 7.9 years ago by Benm710
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1193 users visited in the last hour