Hi guys,
I am using Illumnia Hiseq sequence reads 100bp paired end raw data.I have been toying with data on Dynamictrim and length sort for some time.
When I want to use the settings q=25 (-h 25) and length sort ( -l 50). My data is trimmed first of all by dynamic trim and looks fine on fastQC,but when I carry out the second part (length sort) and put my .trimmed.paired 1 and .trimmed.paired 2 files through FastQC after length sort it says the quality is very low ( around 12-13) and in the histogram like graph of quality per reads all of the reads are down in the red. I have used the recent version (2.2) and also tried older versions just to ensure a bug isn't present.
When I use default settings … p=0.05 and Lengthsort ( -l 25) my results are actually quite good.
Now my problem is…although the data I get on default settings is probably good to use I would like it a little more stringent. I was advised to use a phred score of 25 as for my dataset, Q=20 is not stringent enough and Q=30 is too stringent? I was also advised as my raw reads are 100bp i should be looking at cutting the reads to a length of 50bp ( anything >50-100 bp is a good data input). This is coming from a bioinformatician, is there a good paper which explains why I should be doing this as I am still quite unclear?
Can I also just ask, using default settings is this fine or should I try and go with the custom settings.I am using the data for De novo transcriptome assemblage.
Many thanks.