How to output fastq files from in silico digests?
2
0
Entering edit mode
5.9 years ago

Hi All,

I'm a bit of a scripting novice and would greatly appreciate some advice. I have fastq read files from numerous sequenced ddRAD libraries and on which I'd like to perform in silico digests (using MseI: A^ATT) and retain all fragments greater than a certain length. The output needs to be in fastq format for downstream processing. I'm able to do something like this using the Bio.Restriction package of Biopython, but the output is fasta (i.e., there's no read quality data). Does anyone have an idea on what a suitable approach would be?

Many thanks in advance.

fastq in silico digest ddRAD • 1.4k views
ADD COMMENT
2
Entering edit mode
5.9 years ago

using awk:

BEGIN   {
    FS="[\t]";
    MIN_LEN=6;
    }
    {
     D=$2;
     Q=$4;
     for(;;)
        {
        n=index(D,"AATT");
        if( n ==0 ) {
            if(length(D)>=MIN_LEN) printf("%s\n%s\n%s\n%s\n",$1,D,$3,Q);
            break;
            }
        if(n>=MIN_LEN)
            {
            printf("%s\n%s\n%s\n%s\n",$1,substr(D,1,n),$3,substr(Q,1,n));
            }
        D=substr(D,n+1);
        Q=substr(Q,n+1);
        }
    }

usage:

gunzip -c in.fq.gz | paste - - - -  | awk -f script.awk
ADD COMMENT
0
Entering edit mode
5.9 years ago

Hello,

you could add a dummy value for the phred scores. Look here.

fin swimmer

ADD COMMENT
0
Entering edit mode

I think OP wants to digest fastq reads and then retain fragments longer than a certain length still in fastq format with original Q scores. This may need to be done with a custom script.

ADD REPLY
0
Entering edit mode

That was my consideration too before commit my answer. But the OP wrote:

I'm able to do something like this using the Bio.Restriction package of Biopython, but the output is fasta (i.e., there's no read quality data).

So I thought it's ok to show a way from fasta to fastq. If the original quality values are realy needed, than of course my answer isn't valid.

fin swimmer

ADD REPLY
0
Entering edit mode

Fair enough. Depending on OP's response we can decide.

ADD REPLY
0
Entering edit mode

Thanks very much for your thoughts on this. I really do need the Q scores as these are needed for downstream filtering in ipyrad. I'll give Pierre's script a go and see if this does the job.

ADD REPLY

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6