Question: FastQC: per tile sequence quality after using filterbytile.sh
0
gravatar for Nagesh
9 months ago by
Nagesh0
IN
Nagesh0 wrote:

Hello, I have tried filterbytile.sh to improve the quality of library. Though the Per base sequence quality is above 30 PHRED score, per tile sequence quality is gone bad at right side of the reads. So, is there any other way to trim the the bad quality bases at one end to get the better per tile sequence quality. Thanks in advance.

Here I have attached the image of per tile sequence quality after filterbytile.sh filteration. f1

Output of trimmomatic tool was used as input to filterbytile.sh

sequencing next-gen • 927 views
ADD COMMENTlink modified 9 months ago • written 9 months ago by Nagesh0

Unless you have a real good reason you should not need to use filterbytile.sh with recent data. What was the logic behind using it in this instance. We seem to be missing the complete picture here as hinted by @h.mon.

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax65k

The data which I am having is for a microbial genome and trying to improve the data quality as much as good. I would like to see whether the genome assembly will be better or not.

ADD REPLYlink written 9 months ago by Nagesh0

Figure I have given above is after filterbytile.sh filteration with the following command filterbytile.sh in1=R1.fastq.gz in2=R2.fastq.gz out1=f1.fq out2=f2.fq trimq=1 qtrim=rl lowqualityonly=f ud=0.75 qd=1 ed=1 ua=.5 qa=.5 ea=.5

This is the raw data per tile quality figure f1

There is no much changes from trimmomatic to filterbytile.sh

Trimmomatic command I used along with the SLIDINGWINDOW:2:30 MINLEN:20 LEADING:30 TRAILING:30 parameters.

ADD REPLYlink modified 9 months ago • written 9 months ago by Nagesh0
2

I think there was no need to use filterbytile.sh in this case. I suggest that if you are staying with BBMap suite then use bbduk.sh to scan and trim your data to remove any extraneous sequence. Since you want to do de novo assembly you should also quality filter at Q20. But that should be all you need before going into SPAdes or a similar assembler. If you have really deep coverage then doing normalization of reads may be needed.

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax65k

You are using very strict settings, for when you know there’s a serious problem (ref: Introducing FilterByTile: Remove Low-Quality Reads Without Adding Bias), however, the picture you linked does not indicate any serious tile problems.

ADD REPLYlink written 9 months ago by h.mon24k
0
gravatar for h.mon
9 months ago by
h.mon24k
Brazil
h.mon24k wrote:

Is the figure you posted from before or after filterbytile.sh? And before or after Trimmomatic? Did you run FastQC with the raw reads, then after each pre-proccessing step?

You don't have systematic tile bad quality, you have the often seen decrease in quality associated with sequencing cycles. If the figure was generated after Trimmomatic filtering, you may have to change your settings - what was the command-line you used? If the figure is from before Trimmomatic filtering, run FastQC again and compare.

ADD COMMENTlink written 9 months ago by h.mon24k

OK, I'll pick in on this. I have the following per_tile_quality plot:

per_tile_quality

This is for the R2 from paired end data, after quality trimming with BBduk. I was (to be honest) not even aware of this per_tile issue, so my question is: should I be doing filterByTile.sh here?

ADD REPLYlink modified 9 months ago • written 9 months ago by lieven.sterck4.5k
1

I am not sure what is going on in that block of tiles at the top right. You could examine some of those reads. It could just be an artefact of how FastQC is plotting that data. FastQC samples a fraction of the data for most of the parameters it checks and I am not sure how much data it uses for these plots.

That said most data I have seen of late rarely needs quality filtering (unless you are doing de novo work and want to be strict about quality).

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax65k

thx for the insight genomax . btw, I plotted those values on a per base resolution so not the usual binning that fastQC applies, if that makes any difference.

Yes it is for denovo assembly purposes, so some strictness on the quality is desirable, though I usually don't take it into the extremes.

How would you 'select' reads that fall in that top right corner?

ADD REPLYlink modified 9 months ago • written 9 months ago by lieven.sterck4.5k

I was not referring to binning of cycles for plotting but down sampling of data during analysis. FastQC does not look at the entire dataset since that would take too much time/memory. Based on info from Dr. Simon Andrews per tile plot only tracks 10% of data while k-mer module uses only 2%.

You should be able to see the tile numbers up at top of Y axis and then grep for those reads in your fastq files.

ADD REPLYlink written 9 months ago by genomax65k

OK, yes, I just added that as additional potential useful info

I see, ok, thanks, will have a look at that.

ADD REPLYlink written 9 months ago by lieven.sterck4.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2435 users visited in the last hour