Question

How to determine which adapter to use while trimming?

0

Entering edit mode

6.0 years ago

MAPK ★ 2.1k

Hi All, I have several smRNAseq data (single end fastq files) and I would like to trim the adapter using trimmomatic, but I am not sure which adapter I have in them. Is there a better way to figure out whether they have NEB-SE, Nextera or TruSeq2 adapter in them?

Here is the command I am using, but I am not sure if that is NEB or Nextera or Truseq.

java -jar trimmomatic-0.36.jar SE -phred33 /media/owner/SeqL008_001.fastq Trimmed_SeqL008_001.fq.gz ILLUMINACLIP:NEB-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:18

It looks like there is no adapter(already trimmed) in my sequence below. can someone please confirm? I also want to know if CATGGC is barcode sequence and if that needs to be trimmed as well?

A<AAFAJF<JJFFJJJFFJJAAFFJFJJJJJFJFJF7J7FF---FFA7FF
@K00363:128:HV3CJBBXX:3:1101:2240:1859 1:N:0:CATGGC
GGAAGAGCACACGTCTGAACTCCAGTCACCATGGGATCTCGTATGCCGTC
+
AAFFFJJJJJJJFJJJJJFJJFJJJJJJJJJJJF-J7FFJFJFJJJ-7FJ
@K00363:128:HV3CJBBXX:3:1101:2706:1859 1:N:0:CATGGC
AAACTTTCAACAACGGATCTCTTGGTTCTGAGATCGGAAGAGCACACGTC
+
AAFFFJJJJJJJJJJFJJJ-<7-F-FJJFJJJJF<FJJFJJJFFJ-7-AA
@K00363:128:HV3CJBBXX:3:1101:2909:1859 1:N:0:CATGGC
TCTTGTATTTGGAGAACTCACTCAGATCGGAAGAGCACACGTCTGAACTC
+
AAFFFJJJJJJJJJJJJJJFJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@K00363:128:HV3CJBBXX:3:1101:2717:1877 1:N:0:CATGGC
TGACCTTTCTGTTCCTTTGATAAGATCGGAAGAGCACACGTCTGAACTCC

trimmomatic NGS fastq adapter • 3.5k views

ADD COMMENT • link updated 6.0 years ago by sschmeier ▴ 120 • written 6.0 years ago by MAPK ★ 2.1k

score 4 · Answer 1 · 2018-07-13

You can use a tool like fastqc to look what kind of sequences are over-represented in your data. That should give you an idea what sequences you are looking at. You could then compare the over-represented seqs with seqs in the illumina adapter file.
For trimming you could also extract all adapter sequences in that illumina adapter file and run your file with fastq-mcf (from ea-utils) which allows to submit an adapter-file and will remove all seqs in that file from your data.
Re-run fastqc afterwards to see if the over-represented seqs are gone from your data.

score 3 · Answer 2 · 2018-07-14

3

Entering edit mode

6.0 years ago

GenoMax 144k

smRNAseq if that stands for small RNAseq data then there could be kit specific adapters that you would need to specifically look for (e.g. there are kits that ligate an adapter directly on 3'-end of RNA) those. You would need to know the name of kit for this to work along with instructions of how to process the data which are included in the manual. CATGGC is illumina index sequence that has been transferred to the fastq header during demultiplexing. You don't need to do anything to it.