3
1
Entering edit mode
2.3 years ago
arctic ▴ 40

Dear all,

I am new to the field. I am trying to analyze single end 100b FastQ files with ~70million reads/sample. I am trying to determine if adapter sequences are present and if so how to go about them. I ran FastQC on the files and reports show they each have an "overrepresented sequence" of an "illumina index adapter" in them.

I have the following questions:

1. Does sample1 look like a trimmed file or it requires adapter trimming?

2. If further trimming is recommended what would be the best seq/adapter option to be used for cutadapt/TrimGalore? [See below for my thoughts so far]

3. Based on the FastQC report, do I need to worry about presence of any other adapter sequences beside the index?

My thoughts on question 2: The sequences for illumina index adapter format appear to be:

GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG


These are the adapter sequences found in my FastQC report for sample 1:

GATCGGAAGAGCACACGTCTGAACTCCAGTCACCATGGCATCTCGTATGC
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCATGGCATCTCGTATG


I am thinking of using below options for cutadapt/trimgalore to remove the adapter(s):

trim_galore sample1.fastq.gz -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG -q 20 --length 20 –fastqc


However, it seems that trimmomatics for instance only takes care of the initial sequence of the index adapter (only up to Ns and not after): https://github.com/timflutre/trimmomatic/blob/master/adapters/TruSeq3-SE.fa

1
Entering edit mode

For your reference most trimming programs should trim all sequence to the right when they find the core sequence that is common to the adapters. Finding the core sequence indicates that one ran out of insert and hit the adapter on 3'-end (i.e you have an insert shorter than the length of sequencing).

0
Entering edit mode

Thanks a lot for setting me in the right direction. So since the illumina universal adapter sequence ("AGATCGGAAGAGC") is already included on the "left  end" (5'end) of the indexing adapter sequence, trimming that sequence alone will also remove the entire index adapter sequence to its right. Which would also be the reason why trimmomatics uses only the "left end" of the index adapter sequence for trimming.  I tried trimming using universal sequence and it appears to have removed the index adapter sequences accordingly.

Best

3
Entering edit mode
2.3 years ago

0.5%? You really don't have to worry about that if you don't want to.

The N's are for the variable index region. You know what the index is, you can see it in the fastqc report. Why would you put N's in?

0
Entering edit mode

About the Ns: I have several files that each have similar index adapters and vary in those 6 bases, so I was planning on including Ns so I could trim all the samples in parallel. However, after genomax's comment, I understand that trimming the "universal adapter sequence" would also remove the rest of index adapter sequence as well. Considering both your and ATpoint's feedbacks, I think I will skip the trimming step altogether. Best.

3
Entering edit mode
2.3 years ago
ATpoint 62k

The sequence to trim if having Universal Adaprer contaminations is AGATCGGAAGAGC. In you case (0.5%) I would not even bother and directly align the files without any manipulations.

0
Entering edit mode

Thank you for your comment ATpoint. I think I will skip trimming and proceed directly with alignment as you and swbarnes2 recommended. Best

0
Entering edit mode
8 months ago
Jiacheng ▴ 30

In your case, you do not need to bother 0.5% adapter contents. However, if you want optimal results, 0.5% adapters can be removed while leaving non-adapter sequences intact as trimming algorithm improves. Also, trimming is not all about adapters. Removing Ns and low-quality tails are also important.

I'd recommend atria to determine and trim the adapter sequences. It is a newly-published cutting-edge trimmer with exceptional precision and speed. And if you do not know what adapter sequence should be used, Atria can detect adapters if adapter content is higher than 0.04%. (If <0.04%, no need to do adapter trimming.)

atria --detect-adapter -r reads.fastq [...]


Do N trimming and low-quality trimming:

atria --no-adapter-trim -r read1.fastq [-R read2.fastq]

0
Entering edit mode

Hi Jiacheng, I checked out atria. It seems the --detect-adapter feature is not ready for mac os, right?