Hey all,
I'm just getting into rna-seq analysis and have run into a problem that's is very confusing. I ran multiplexed RNAseq samples on a Hiseq, paired end, 100bp, reads. The data was demultiplexed and what I thought was trimmed (I used illumina truseq index primers).
So I run my R1 & R2 reads on fastqc and find ~20% illumina universal adapter sequences at the 3' end of my reads. I run BBDuk and it removes about 10% so I'm left with ~ 10% still contaminating the reads. Im wondering what I should do next???
The initial adapter content was a linear slope starting from position 55 to 85 bp, after BBduk the value went to 10% sloping up from 72 to 85bp.
Links to adapters image graph
I'm not sure how to get rid of the rest of the adapters. Fastqc has also changed from an X to an ! So that might be ok then?
$ ./bbduk.sh in1=S42ND_L002_R1_001.fastq.gz in2=S42ND_L002_R2_001.fastq.gz out1=S42ND_L002_R1_001_TRIMMED.fastq.gz out2=S42ND_L002_R2_001_TRIMMED.fastq.gz ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo
java -ea -Xmx1205m -Xms1205m -cp /media/sf_ubuntushared/bbmap/current/ jgi.BBDuk in1=S42ND_L002_R1_001.fastq.gz in2=S42ND_L002_R2_001.fastq.gz out1=S42ND_L002_R1_001_TRIMMED.fastq.gz out2=S42ND_L002_R2_001_TRIMMED.fastq.gz ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo
Executing jgi.BBDuk [in1=S42ND_L002_R1_001.fastq.gz, in2=S42ND_L002_R2_001.fastq.gz, out1=S42ND_L002_R1_001_TRIMMED.fastq.gz, out2=S42ND_L002_R2_001_TRIMMED.fastq.gz, ref=adapters.fa, ktrim=r, k=23, mink=11, hdist=1, tpe, tbo]
Version 38.68
Thanks for any help
Generally when trimming programs find core sequence that comprises the beginning of adapter sequence they should remove all sequence 3' to the location where that core sequence is found. Your command is common and should have worked.
Can you try doing the trimming using
ktrim=rl
(that is an R and L, lower case) option? Let us know what that does.Also try
bbmerge.sh
on the original reads to see how many of your reads merge (a guide is available here). I have a feeling that you have short inserts compared to the length of sequencing.So I was running the code above in parallel with writing the post. My previous code (not posted here) did not have hdist, tpe or tbo and I was getting the left over adapters. When I added hdist, tpe, tbo, to the code it removed all the adapter sequences. Adapters are gone
My comments were based on the command you had posted.
tpe tbo
are indeed important parameters for paired end reads. Good to hear trimming worked.Appreciate the support and clarity!