Question: How to determine which adapter to use while trimming?
0
gravatar for MAPK
14 months ago by
MAPK1.4k
United States
MAPK1.4k wrote:

Hi All, I have several smRNAseq data (single end fastq files) and I would like to trim the adapter using trimmomatic, but I am not sure which adapter I have in them. Is there a better way to figure out whether they have NEB-SE, Nextera or TruSeq2 adapter in them?

Here is the command I am using, but I am not sure if that is NEB or Nextera or Truseq.

java -jar trimmomatic-0.36.jar SE -phred33 /media/owner/SeqL008_001.fastq Trimmed_SeqL008_001.fq.gz ILLUMINACLIP:NEB-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:18

It looks like there is no adapter(already trimmed) in my sequence below. can someone please confirm? I also want to know if CATGGC is barcode sequence and if that needs to be trimmed as well?

A<AAFAJF<JJFFJJJFFJJAAFFJFJJJJJFJFJF7J7FF---FFA7FF
@K00363:128:HV3CJBBXX:3:1101:2240:1859 1:N:0:CATGGC
GGAAGAGCACACGTCTGAACTCCAGTCACCATGGGATCTCGTATGCCGTC
+
AAFFFJJJJJJJFJJJJJFJJFJJJJJJJJJJJF-J7FFJFJFJJJ-7FJ
@K00363:128:HV3CJBBXX:3:1101:2706:1859 1:N:0:CATGGC
AAACTTTCAACAACGGATCTCTTGGTTCTGAGATCGGAAGAGCACACGTC
+
AAFFFJJJJJJJJJJFJJJ-<7-F-FJJFJJJJF<FJJFJJJFFJ-7-AA
@K00363:128:HV3CJBBXX:3:1101:2909:1859 1:N:0:CATGGC
TCTTGTATTTGGAGAACTCACTCAGATCGGAAGAGCACACGTCTGAACTC
+
AAFFFJJJJJJJJJJJJJJFJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@K00363:128:HV3CJBBXX:3:1101:2717:1877 1:N:0:CATGGC
TGACCTTTCTGTTCCTTTGATAAGATCGGAAGAGCACACGTCTGAACTCC
adapter fastq ngs trimmomatic • 672 views
ADD COMMENTlink modified 14 months ago by sschmeier80 • written 14 months ago by MAPK1.4k
2
gravatar for genomax
14 months ago by
genomax71k
United States
genomax71k wrote:

smRNAseq if that stands for small RNAseq data then there could be kit specific adapters that you would need to specifically look for (e.g. there are kits that ligate an adapter directly on 3'-end of RNA) those. You would need to know the name of kit for this to work along with instructions of how to process the data which are included in the manual. CATGGC is illumina index sequence that has been transferred to the fastq header during demultiplexing. You don't need to do anything to it.

ADD COMMENTlink modified 14 months ago • written 14 months ago by genomax71k

Thanks for the helpful answer.

ADD REPLYlink written 14 months ago by MAPK1.4k
1
gravatar for sschmeier
14 months ago by
sschmeier80
New Zealand
sschmeier80 wrote:
  1. You can use a tool like fastqc to look what kind of sequences are over-represented in your data. That should give you an idea what sequences you are looking at. You could then compare the over-represented seqs with seqs in the illumina adapter file.

  2. For trimming you could also extract all adapter sequences in that illumina adapter file and run your file with fastq-mcf (from ea-utils) which allows to submit an adapter-file and will remove all seqs in that file from your data.

  3. Re-run fastqc afterwards to see if the over-represented seqs are gone from your data.

ADD COMMENTlink written 14 months ago by sschmeier80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1954 users visited in the last hour