Hi All,
I'm a bit of a scripting novice and would greatly appreciate some advice. I have fastq read files from numerous sequenced ddRAD libraries and on which I'd like to perform in silico digests (using MseI: A^ATT) and retain all fragments greater than a certain length. The output needs to be in fastq format for downstream processing. I'm able to do something like this using the Bio.Restriction package of Biopython, but the output is fasta (i.e., there's no read quality data). Does anyone have an idea on what a suitable approach would be?
Many thanks in advance.
I think OP wants to digest fastq reads and then retain fragments longer than a certain length still in fastq format with original Q scores. This may need to be done with a custom script.
That was my consideration too before commit my answer. But the OP wrote:
So I thought it's ok to show a way from fasta to fastq. If the original quality values are realy needed, than of course my answer isn't valid.
fin swimmer
Fair enough. Depending on OP's response we can decide.
Thanks very much for your thoughts on this. I really do need the Q scores as these are needed for downstream filtering in ipyrad. I'll give Pierre's script a go and see if this does the job.