Question: quality value with art fastq simulator
gravatar for marongiu.luigi
2.8 years ago by
Germany, Mannheim, UMM
marongiu.luigi520 wrote:

dear all,

i am trying to generate simulated fastq files from a fasta reference using ART. Following the manual, I entered the following:

art_illumina -ss HS25 -i ./input.fa -p -l 50 -f 20 -m 200 -s 10 -o ./output

In this case, the simulated instruments is Illumina HiSeq2500, pair mates created, length 50 pb with mean of 200 (not sure what the difference is here) and a coverage of 20. I then checked the quality of the output with FastQC and I get reads of 50 bp in length but the quality is all skewed at the maximum of 38 quality: enter image description here I therefore provided the values for maximum and minimum quality score:

art_illumina -ss HS25 -i ./input -p -l 36 -f 30 -m 50 -s 10 -qU 30 -qL 25 -o ./output

but in this case the quality score was not simply skewed: rather it was uniform with a single value of 30: enter image description here

How can I obtain something more like the following plot? enter image description here Thank you.

sequencing rna-seq next-gen • 1.3k views
ADD COMMENTlink modified 2.7 years ago by h.mon32k • written 2.8 years ago by marongiu.luigi520

Since the read length for HiSeq2500 is 36

No it is not. You can run sequencing lengths as long or as short as you want. Maximum length for HiSeq 2500 rapid run can be 2 x 250 bp. In order to get specific enough mapping you probably don't want to go much below 36 bp (for a human sized genome).

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by GenoMax95k

OK, I took the lower end of the scale. But how would I set a good range of quality score? and what is the relation between -l and -m? Tx

ADD REPLYlink written 2.8 years ago by marongiu.luigi520

You may want to look at the in-line help/manual for the specific options of ART.

Realistically if your libraries are good then you are rarely going to see Q scores below 30 across the board. So things between 30-40 would be fine. If you are artificially trying to achieve a different range then you can choose those numbers.

ADD REPLYlink written 2.8 years ago by GenoMax95k

I am trying to aritficially create some libraries that look like THIS. I therefore provided, based on the readme file included with ART, the options -qL --minQ the minimum base quality score and -qU --maxQ the maxiumum base quality score but the values were not randomly sampled between these boundaries but fixed at 30.

ADD REPLYlink written 2.8 years ago by marongiu.luigi520

I think I got the difference between -l and -m: the former is the length of the read in the fastq file, the latter the length of the fragment of DNA/RNA that is being sequenced, therefore m needs to be longer than l.

ADD REPLYlink written 2.8 years ago by marongiu.luigi520

For better focus, I removed the part fo the post dealing with the reading lenght

ADD REPLYlink written 2.7 years ago by marongiu.luigi520
gravatar for h.mon
2.7 years ago by
h.mon32k wrote:

Grab a set of reads with the intended quality profile, create a profile with art_illumina_profiler, the simulate reads with the same profile using art_illumina and the parameters:

    -1   --qprof1   the first-read quality profile
    -2   --qprof2   the second-read quality profile
ADD COMMENTlink written 2.7 years ago by h.mon32k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1064 users visited in the last hour