Question: Trim Sequences And Quality In Fastq File
1
gravatar for abn4
5.6 years ago by
abn410
abn410 wrote:

I have bunch of fastq files in the directory and i want to trim the sequence by 2 nucleotides and quality(if the read has 51 base pairs and ends-with CTG or TTG).here is what i wrote as shell script but i am getting some errors,need help as i am new to shell scripting

Input:

@HWI-ST1072:187:C35YUACXX:7:1101:1609:1983 1:N:0:ACAGTG
NGGAGAAAGAGAGTGTGTTTTTAGGGGGAGATTTTTAAAATGGTTGTTTTG
+
#0<BFFFFFFFFF&lt;BFFFIIFFFFFIIIBFFFFFIIFIIIIIFFBFFFFFF
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTATTCGGGAGGTTGAGCTG
+
#0<BFFFFFFFFFFIIBFFIIIIIIFIIIFFIIFIIIFIIFIIFFFFIIFF

output:

@HWI-ST1072:187:C35YUACXX:7:1101:1609:1983 1:N:0:ACAGTG
NGGAGAAAGAGAGTGTGTTTTTAGGGGGAGATTTTTAAAATGGTTGTTT
+
#0<BFFFFFFFFF<BFFFIIFFFFFIIIBFFFFFIIFIIIIIFFBFFFF
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTATTCGGGAGGTTGAGC
+
#0<BFFFFFFFFFFIIBFFIIIIIIFIIIFFIIFIIIFIIFIIFFFFII

CODE:

for sample in *.fastq;
do
    name=$(echo ${sample} | sed 's/.fastq//')
    while read line;do
            if [ ${line:0:1} == "@" ] ; then
                    head="${line}"
                    $echo $head
            elif [ "${head}" ] &amp;&amp; [ "${line}" ] ; then
                    length=${#line}
                    if [ "${length}" = 51 -a "${line}" =~ *CTG|*TTG ] ; then
                            sequence= substr($line,0,49)
                            #echo $sequence
                    fi
            elif [ ${line:0:1} == "+" ] ; then
                    plus="${line}"
                    #echo $plus
            elif [ "${plus}" ] && [ "${line}" ] ; then
                    quality= substr($line,0,49)
                    #echo $quality
            fi
            print "${head}\n${sequence}\n${plus}\n${quality}" > ${name}_new.fq
    done < $sample
done
unix • 3.2k views
ADD COMMENTlink modified 19 months ago by RamRS24k • written 5.6 years ago by abn410

seqtk is the best tool for this very fast and accurate

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by ancient_learner610
1
gravatar for Frédéric Mahé
5.6 years ago by
France, Montpellier, CIRAD
Frédéric Mahé2.9k wrote:

Indeed, there are tools to do that. But reinventing the wheel is necessary if one wants to learn how to code. Here is one way to translate your idea into shell bash code:

for sample in *.fastq ; do
    while read line ; do
        if [[ "${line:0:1}" == "@" ]] ; then
            head=${line}
        elif [[ "${line:0:1}" == "+" ]] ; then
            plus="${line}"
        elif [[ -n "${head}" && -z "${plus}" ]] ; then
            if [[ "${#line}" == 51 && "${line}" =~ TTG$|CTG$ ]] ; then
                sequence=${line:0:49}
            fi
        elif [[ -n "${head}" && -n "${plus}" ]] ; then
            if [[ "${#line}" == 51 ]] ; then
                quality=${line:0:49}
                echo -e "${head}\n${sequence}\n${plus}\n${quality}"
            fi
            head=""
            plus=""
        fi
    done < "${sample}" > ${sample/.fastq/_new.fq}
done
ADD COMMENTlink modified 19 months ago by RamRS24k • written 5.6 years ago by Frédéric Mahé2.9k
0
gravatar for Charles Warden
5.6 years ago by
Charles Warden7.2k
Duarte, CA
Charles Warden7.2k wrote:

Are you familiar with FASTX-Toolkit?

http://hannonlab.cshl.edu/fastx_toolkit/

It can handle the 2 nt trim. It can trim based upon quality scores, but this is different than your "quality" definition. That said, you can use it to remove adapters, so you can try using defining your problematic sequences as short adapters.

The main advantage is that you won't have to worry about troubleshooting because lots of people use that program. Of course, I'm sure you can write the code in our language of preference (I personally would have done it in Perl), but you may just not be able to get external help with fixing your detailed code.

ADD COMMENTlink modified 5.6 years ago • written 5.6 years ago by Charles Warden7.2k

Thanks! I will check it

ADD REPLYlink written 5.6 years ago by abn410
0
gravatar for Chris Cole
5.6 years ago by
Chris Cole710
Scotland
Chris Cole710 wrote:

There are many pre-existing trimming tools. I'd use one of them first to see if they can do what you want. From the top of my head I can think of: trimmomatic, trim galore and fastx toolkit.

Also, you don't say why you want to trim those nucleotides from your reads? Care to elaborate?

ADD COMMENTlink written 5.6 years ago by Chris Cole710
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1621 users visited in the last hour