How to determine which adapter to use while trimming?
2
0
Entering edit mode
5.8 years ago
MAPK ★ 2.1k

Hi All, I have several smRNAseq data (single end fastq files) and I would like to trim the adapter using trimmomatic, but I am not sure which adapter I have in them. Is there a better way to figure out whether they have NEB-SE, Nextera or TruSeq2 adapter in them?

Here is the command I am using, but I am not sure if that is NEB or Nextera or Truseq.

java -jar trimmomatic-0.36.jar SE -phred33 /media/owner/SeqL008_001.fastq Trimmed_SeqL008_001.fq.gz ILLUMINACLIP:NEB-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:18

It looks like there is no adapter(already trimmed) in my sequence below. can someone please confirm? I also want to know if CATGGC is barcode sequence and if that needs to be trimmed as well?

A<AAFAJF<JJFFJJJFFJJAAFFJFJJJJJFJFJF7J7FF---FFA7FF
@K00363:128:HV3CJBBXX:3:1101:2240:1859 1:N:0:CATGGC
GGAAGAGCACACGTCTGAACTCCAGTCACCATGGGATCTCGTATGCCGTC
+
AAFFFJJJJJJJFJJJJJFJJFJJJJJJJJJJJF-J7FFJFJFJJJ-7FJ
@K00363:128:HV3CJBBXX:3:1101:2706:1859 1:N:0:CATGGC
AAACTTTCAACAACGGATCTCTTGGTTCTGAGATCGGAAGAGCACACGTC
+
AAFFFJJJJJJJJJJFJJJ-<7-F-FJJFJJJJF<FJJFJJJFFJ-7-AA
@K00363:128:HV3CJBBXX:3:1101:2909:1859 1:N:0:CATGGC
TCTTGTATTTGGAGAACTCACTCAGATCGGAAGAGCACACGTCTGAACTC
+
AAFFFJJJJJJJJJJJJJJFJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@K00363:128:HV3CJBBXX:3:1101:2717:1877 1:N:0:CATGGC
TGACCTTTCTGTTCCTTTGATAAGATCGGAAGAGCACACGTCTGAACTCC
trimmomatic NGS fastq adapter • 3.3k views
ADD COMMENT
4
Entering edit mode
5.8 years ago
sschmeier ▴ 120
  1. You can use a tool like fastqc to look what kind of sequences are over-represented in your data. That should give you an idea what sequences you are looking at. You could then compare the over-represented seqs with seqs in the illumina adapter file.

  2. For trimming you could also extract all adapter sequences in that illumina adapter file and run your file with fastq-mcf (from ea-utils) which allows to submit an adapter-file and will remove all seqs in that file from your data.

  3. Re-run fastqc afterwards to see if the over-represented seqs are gone from your data.

ADD COMMENT
3
Entering edit mode
5.8 years ago
GenoMax 141k

smRNAseq if that stands for small RNAseq data then there could be kit specific adapters that you would need to specifically look for (e.g. there are kits that ligate an adapter directly on 3'-end of RNA) those. You would need to know the name of kit for this to work along with instructions of how to process the data which are included in the manual. CATGGC is illumina index sequence that has been transferred to the fastq header during demultiplexing. You don't need to do anything to it.

ADD COMMENT
0
Entering edit mode

Thanks for the helpful answer.

ADD REPLY

Login before adding your answer.

Traffic: 1941 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6