Question: General question about batch effect, read trimming and what to do when the adapter trimming step is not working appropriately.
1
gravatar for Mozart
5 months ago by
Mozart140
Mozart140 wrote:

Hello everyone, I have a huge dataset with a bunch of human samples to analyse. Of course, I run into troubles because the samples come from different donors and when I PCA those samples, well...it's a bit dodgy. They cluster according to their condition but I am not sure about how am I supposed to deal with this batch effect? A few time ago, I used SVA package but I wasn't happy with that.

A problem related to this is probably due to the fact that my samples are not trimmed appropriately. I have a lot of problem with the facility that generated these fastq files because sometimes they provide me trimmed samples, sometimes they don't (given the fact that this whole dataset comes from different batches/years). Thus, my questions:

  1. Don't you think that all of my samples, to generate useful data, must have been processed in the same identical way (e.g. same Sliding window, leading, trailing, minlen)? I am quite confused about this.
  2. What if, by any chance, I trim an already-trimmed file?
  3. When I am trying to trim my samples, I don't manage to remove adapter contamination..according to my beloved multiqc report there's a huge nextera transposase sequence contamination that Trimmomatic can't remove, even when selecting specific adapters...

Yours, M

ADD COMMENTlink modified 4 months ago by swbarnes26.5k • written 5 months ago by Mozart140

when I PCA those samples, well...it's a bit dodgy

How was PCA done and how was data normalization/regularization performed?

They cluster according to their condition

Isnt't that expected as this is the biological difference?

problem with the facility that generated these fastq files because sometimes they provide me trimmed samples

It is very uncommon that facilitites provide adapter-trimmed samples. Do you really mean trimmed or demultiplexed?

As for the questions 1-3:

  1. Yes data should be uniformly processed but re-trimming a dataset is probably not harmful as there should be little effect if indeed the adapter sequence is not present anymore.
  2. see 1)
  3. Did you provide the correct adapter sequence? See for example code in the web. If the sequence persists, your command is somewhat wrong. Can you share some command lines?
ADD REPLYlink modified 5 months ago • written 5 months ago by ATpoint23k

As a small addition, do a fastQC report for each sample before and after trimming. Afterwards, run on the reports the multiqc tool.

Then you'll see the differences in adapter content, read length, etc.

ADD REPLYlink written 5 months ago by michael.ante3.4k

Thanks ATpoint for your question. I am judging the PCA according to someone else's analysis. I hadn't got the chance to get to that point yet. By the way, I guess there is very little variation amongst the different samples.

Anyway I solved the issue but, as you can see below, I am not sure if I have to use either paired or unpaired samples, after trimming.

ADD REPLYlink written 5 months ago by Mozart140

I have recently used Trimmomatic to remove nextera transposase sequence so it is probably just a matter of providing the correct sequence to use.

ADD REPLYlink written 5 months ago by kristoffer.vittingseerup2.3k
1

Agreed- The standard tools (I use cutadapt) all perform more or less equally-well and if it does not work it is 99.9% of the time a user-induced problem (=wrong commands, wrong adapter sequences provided etc.)

ADD REPLYlink modified 5 months ago • written 5 months ago by ATpoint23k

So, for quality sake, paired reads may show a better reliability for the further steps.

Can anyone confirm this?

ADD REPLYlink written 4 months ago by Mozart140
3
gravatar for swbarnes2
4 months ago by
swbarnes26.5k
United States
swbarnes26.5k wrote:

A problem related to this is probably due to the fact that my samples are not trimmed appropriately.

I wouldn't be so sure. For instance, I know that STAR aligner is pretty robust to having wrong sequence on the ends of reads.

If your trimmer isn't trimming anything, maybe nothing needs trimming. If you have a big batch effect, that's likely real, and not an artifact you can fix.

ADD COMMENTlink written 4 months ago by swbarnes26.5k

Thanks swbarnes2. I am now uncertain about the following step: I have re-read timmomatic manual and as you know for pair ended analysis you generate with it 4 output files. Amongst the latter, should I used paired output for the downstream analysis?

ADD REPLYlink written 4 months ago by Mozart140

Mozart, the paired files are what you use. I just want to confirm with you though...Are they the correct Adaptor sequences that your sequencing was performed with? It may be that the adaptor sequences were removed by the facility, common. As such, very low % of reads will be trimmed..

Link to Nextera adaptor information

You can go to the link above / contact the sequencing facility and check if the adaptors used are the same as within the Trimmomatic NexteraPE-PE.fa file. Best to do now before you proceed with downstream analyses.

ADD REPLYlink written 4 months ago by Biogeek350

Thanks Biogeek. I have to double check this again. I knew that BCL data coming out from the sequencer (and then converted into Fastq files) are subjected to an adapter trimming step so I may have untrimmed samples with no contamination...then, what if I don't have an adapter contamination in my untrimmed samples and the sequence quality is OK to perform downstream analysis (i.e. alignment)? Should I perform the trimming step, anyway?

ADD REPLYlink written 3 months ago by Mozart140
1
gravatar for colindaven
5 months ago by
colindaven1.7k
Hannover Medical School
colindaven1.7k wrote:

Try alternative trimmers too. I use fastp and ea-utils fastq-mcf for tricky samples besides the standard Trimmomatic.

I also use multiple rounds of trimming to eg, remove adapters from some tricky short sequences, eg miRNAs or amplicons.

Multiple rounds of FASTQC and Multiqc are also necessary.

ADD COMMENTlink written 5 months ago by colindaven1.7k
1
gravatar for Biogeek
5 months ago by
Biogeek350
Biogeek350 wrote:

I'd recommend using BBDUK under the bb tools suite by Brian Bushnell. It has an extensive adapter.fa file containing all publicly available adaptor sequences - just an idea? The amount of times people sue Trimmomatic without the correct adaptor sequence .fa file. Admittedly I also made that mistake and realised once. The performance of BBDUK is supposedly superior to Trimmomatic.

Once you've tried BBDUK, report back the QC results. The log output will also inform you of adaptor sequence % detected and removed.

Best.

ADD COMMENTlink written 5 months ago by Biogeek350
0
gravatar for Mozart
5 months ago by
Mozart140
Mozart140 wrote:

Thanks all of you for the useful replies. Following the code I am using:

java -jar /Users/Trimmomatic-0.39/trimmomatic-0.39.jar PE -phred33 -threads 4 /Users/FASTQ/sample1_R1_001.fastq.gz /Users/FASTQ/sample1_R2_001.fastq.gz /Users/FASTQ/sample1_R1_paired.fastq.gz /Users/FASTQ/sample1_R1_unpaired.fastq.gz /Users/FASTQ/sample1_R2_paired.fastq.gz /Users/FASTQ/sample1_R2_unpaired.fastq.gz 
ILLUMINACLIP:/Users/Trimmomatic-0.39/adapters/NexteraPE-PE.fa SLIDINGWINDOW:value LEADING:value TRAILING:value MINLEN:value

It seems to work now, because I slightly changed the code to be honest. In fact looking at the QC report again, it seems I managed to remove the adapter contamination

At the end of this process, should I use the paired file for the downstream analysis, right?

Thanks,

M

ADD COMMENTlink written 5 months ago by Mozart140

I have re-read the manual again and again. The paired output file is fastq trimmed in which both reads (contained in each fastq file) survived the processing.

ADD REPLYlink modified 4 months ago • written 4 months ago by Mozart140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 813 users visited in the last hour