Does adapter removal (with trimmomatic) also remove barcodes?
1
0
Entering edit mode
3.6 years ago
salamandra ▴ 420

1- When we remove adapters with trimmomatic for example, are we also removing barcodes? Or is there another command to remove barcodes.

2- I heard that some sequencing providers already remove barcodes from their samples before delivering sequence to the client. Is it the case for Illuminia?

3- Does Illuminia remove adapters from reads before providing them to the client?

barcodes adapters RNA-Seq • 3.3k views
3
Entering edit mode

1) Trimmomatic removes adapter sequences based on the sequences you provide. Removal of "barcodes" (you probably mean sequencing indices) is called demultiplexing and is not supported by Trimmomatic.

2) Illumina is only the company behind the sequencing technology. It depends on the sequencing center you work with, if they provide demultiplexed files. Typically that is the case. If you download from NCBI or ENA, stuff is (as far as I know) always already demultiplexed.

3) Again, depends on the facility. If you book this service, they might do it. Typically they only demultiplex. Use fastqc to check for adapter content (which I always recommend, not because I do not trust the bioinformaticians at the facilities, but in the end it is you as the analyst who must confirm that the data quality is good, no matter what the facility said).

0
Entering edit mode

2) My reads are separated in different files, which might indicate they were de-multiplexed. Does this means the barcodes were removed from reads also, or although reads were split into different files according to sample the barcodes are still in the sequence? In latter case, which tool alows removal of barcode sequences?

3) Is it enough to look at 'adapter content' fastqc? I ask because, in some samples there was no warnings in 'adapter content module', but 'overepresented sequences module' had some sequences called illuminia index 'something'

3
Entering edit mode

See my comment below. Index sequences (barcodes) are moved to the headers of fastq sequence as a part of demultiplexing process.

It is not enough to just look at FastQC report. You should always scan (and trim) your data with a proper program like bbduk.sh or trimmomatic. There can be low level contamination of adapters in your sequence that FastQC can miss. FastQC does not look at every read in the dataset as it does QC (only parts of data are used for various tests and that is generally ok).

0
Entering edit mode

In case samples are not demultiplexed wich tool can be used to demultiplex?

1
Entering edit mode

Since you likely don't have access to original flowcell data folder you may need to use: deML or demuxbyname.sh from BBMap suite. You will need to know index sequences association for the samples for BBMap option.

0
Entering edit mode

What do you mean by barcode, the primer? Most of the time the adapters are already trimmed. This is done during the basecalling/demultiplexing of raw data. Terminology is always confusing, also not sure if I use the right words now.

0
Entering edit mode

mean, the sequences that identify the different samples

1
Entering edit mode

Ah oke clear. If you got them back as seperated files you can open a file and check if all te sequences start with the same bases.

0
Entering edit mode

I would say yes. In the manual it says "ILLUMINACLIP: Cut adapter and other illumina-specific sequences from the read" so I assume also the nextera labels etc. Manual: http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf

But it easy to check for yourself. Just run trimmomatic on a subsample and see if the everything is trimmed off that you wanted to be trimmed off.

2
Entering edit mode

Index sequences (or barcodes) are not the same thing as adapters. Index sequences are always read independently in Illumina tech and are never part of the main reads. ILLUMINACLIP is cutting adapter sequences.

0
Entering edit mode
3.6 years ago

No one can answer this without knowing if you did anything custom.

In general, Illumina sample indices are a totally different read. They do not need to be trimmed from the main reads. If anywhere, you will see the index in the read name.