Question: How to output fastq files from in silico digests?
0
gravatar for gdholmes007
5 months ago by
gdholmes0070 wrote:

Hi All,

I'm a bit of a scripting novice and would greatly appreciate some advice. I have fastq read files from numerous sequenced ddRAD libraries and on which I'd like to perform in silico digests (using MseI: A^ATT) and retain all fragments greater than a certain length. The output needs to be in fastq format for downstream processing. I'm able to do something like this using the Bio.Restriction package of Biopython, but the output is fasta (i.e., there's no read quality data). Does anyone have an idea on what a suitable approach would be?

Many thanks in advance.

ddrad fastq in silico digest • 196 views
ADD COMMENTlink modified 5 months ago by Pierre Lindenbaum113k • written 5 months ago by gdholmes0070
2
gravatar for Pierre Lindenbaum
5 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum113k wrote:

using awk:

BEGIN   {
    FS="[\t]";
    MIN_LEN=6;
    }
    {
     D=$2;
     Q=$4;
     for(;;)
        {
        n=index(D,"AATT");
        if( n ==0 ) {
            if(length(D)>=MIN_LEN) printf("%s\n%s\n%s\n%s\n",$1,D,$3,Q);
            break;
            }
        if(n>=MIN_LEN)
            {
            printf("%s\n%s\n%s\n%s\n",$1,substr(D,1,n),$3,substr(Q,1,n));
            }
        D=substr(D,n+1);
        Q=substr(Q,n+1);
        }
    }

usage:

gunzip -c in.fq.gz | paste - - - -  | awk -f script.awk
ADD COMMENTlink written 5 months ago by Pierre Lindenbaum113k
0
gravatar for finswimmer
5 months ago by
finswimmer6.2k
Germany
finswimmer6.2k wrote:

Hello,

you could add a dummy value for the phred scores. Look here.

fin swimmer

ADD COMMENTlink modified 5 months ago • written 5 months ago by finswimmer6.2k

I think OP wants to digest fastq reads and then retain fragments longer than a certain length still in fastq format with original Q scores. This may need to be done with a custom script.

ADD REPLYlink modified 5 months ago • written 5 months ago by genomax57k

That was my consideration too before commit my answer. But the OP wrote:

I'm able to do something like this using the Bio.Restriction package of Biopython, but the output is fasta (i.e., there's no read quality data).

So I thought it's ok to show a way from fasta to fastq. If the original quality values are realy needed, than of course my answer isn't valid.

fin swimmer

ADD REPLYlink written 5 months ago by finswimmer6.2k

Fair enough. Depending on OP's response we can decide.

ADD REPLYlink written 5 months ago by genomax57k

Thanks very much for your thoughts on this. I really do need the Q scores as these are needed for downstream filtering in ipyrad. I'll give Pierre's script a go and see if this does the job.

ADD REPLYlink written 5 months ago by gdholmes0070
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1110 users visited in the last hour