Question: Read trimming with BBDuk
0
gravatar for seta
2.5 years ago by
seta1.2k
Sweden
seta1.2k wrote:

Hi everybody,

I have several sequencing files of Illumina paired-end reads resulted from NEBNex kit (Prep Master Mix Set for Illumina, E6040, BioLabs) and sequencing by HiSeq 2000. Based on FastQC analysis, for all samples, the length of one set read (from paired-end) is 100bp and the length of the second read is 80 bp. I'll glad if you please let me know why the length of two set reads, corresponding to paired-end reads, are different? Is it normal or there is something wrong?

Anyway, for filtering and adapter trimming, I used bbduk from bbmap package (version 37.17) with the following command:

./bbduk.sh in=file_1.fastq in2=file_2.fastq out=out1.fastq out2=out2.fastq ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=rl trimq=20 ftl=20 ftr=90  minlen=40

Based on re-checking the quality of generated output by FastQC, It sounds that everything OK except for "per base sequence content" and "sequence length distribution". Please see the attached images. Even with removing the first and end bases, the "per base sequence content" still failed Image 1. The sequence length changed from 100 bp to the range of sequences with 41-70 bp in length Image 2. Please kindly tell me what's wrong with my command and how to solve it?

Also, 40% of bases removed after trimming and the read length reduced, which is not my desired. Could you please advise me how to keep more read as possible as for a successful downstream analysis?

Thanks in advance

bbmap bbduk read trimming • 2.6k views
ADD COMMENTlink modified 2.5 years ago by genomax72k • written 2.5 years ago by seta1.2k

Is there an inline barcode of some sort here that you are trying to remove by the aggressive front end trimming?

ADD REPLYlink written 2.5 years ago by genomax72k
0
gravatar for h.mon
2.5 years ago by
h.mon27k
Brazil
h.mon27k wrote:

None of your images got linked.

You are trimming too aggressively here, why trim 30bp of every 100bp read? Remove the ftl=20 ftr=90 parameters.

It is not normal read1 is 100bp and read2 is 80bp - didn't you asked this already? This is public available data, just live with it. Or contact the authors if it bothers you that much. What is the original paper?

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by h.mon27k

Thanks for your response and sorry for images, but they appeared to me. could you please try to take a look at them, again? May try to open them with right click and select "open image in new tab". As you suggested, I removed the two flags (ftl=20 ftr=90), but when I re-checked the quality of trimmed reads using fastqc, the GC graph was odd unlike before trimming (enter image description here). Also, the sequence length distribution has changed from 100 bp (before trimming) to 40-100 bp after trimming (enter image description here. Could you please help me on this issues?

Yes, I asked it and you kindly advised me to use bbduck for read trimming, however, I don't still know if the different length of reads would be problematic for downstream analysis. The original paper can be found at enter link description here.

Thanks

ADD REPLYlink modified 2.4 years ago by genomax72k • written 2.4 years ago by seta1.2k

Do not get hung-up on the FastQC results. If you feel that you have gotten rid of the extraneous sequence (that do not belong to your sample) go on to the next set of analysis steps. If something there does not start making sense then come back to diagnose further.

As for your second image, since some of the reads were trimmed they are no longer the full length (i.e. 100 bp). As a result you are seeing the sequence length distribution that includes reads of various length. Different length of reads should be ok for downstream steps unless you want to filterout really small ones (e.g. < 10 bp).

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by genomax72k

Thank you, genomax2. Ok, I'll go ahead with the same command for all samples. My mean was the different length of two set reads, corresponding to paired-end read, as I posted one set read is 100 bp and another is 80 bp.

ADD REPLYlink written 2.4 years ago by seta1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2211 users visited in the last hour