Question

Incorrect cycle number entered on sample sheet

0

Entering edit mode

5.8 years ago

harks • 0

I have paired end libraries (prepared by sureselect target enrichments hs protocol) with insert sizes of approximately 150 - 200 bp. A 600 cycle v3 kit was used for sequencing on a MiSeq with the intention of setting the cycle number to 101 x 101 however it was mistakenly set to 301 x 301. When I attempt to correct the read length by adapter trimming to remove the read through there is still a sharp peak of reads close to 300 bp in size. FastQC adapter module tells me there is no adapter left. Is it poor practice to force trim reads down to a certain length?

sequencing next-gen • 1.9k views

ADD COMMENT • link 5.8 years ago by harks • 0

0

Entering edit mode

Hi h.mon, Thanks for your reply. There is a reference genome available. I'll try what you suggested and follow up.

ADD REPLY • link 5.8 years ago by harks • 0

0

Entering edit mode

Hello,

I carried out trimming with bbduk.sh using the tbo and tpe flags, as suggested + quality trimmed reads on right to q20 (qtrim=r trimq=20) and chose a minimum length of 50. A peak remained close to 300 bp (now 293 bp after trimming).

I then mapped the reads to my reference sequence using BBMap and plotted the mhist, which is below. I did not include Match1 and 2 in the plot but they maintain 0.99 across the positions until position 293 where they fall to 0.95325 and 0.96665, respectively.

When I remove position 293, I get the following:

As I'm quite new to this type of analysis do you have any recommendations on how to proceed?

ADD REPLY • link 5.8 years ago by harks • 0

0

Entering edit mode

You should trim using ktrim=r which would actually remove the adapter contamination (qtrim is not going to do that). Please post results of what happens with that option. You can leave the qtrim off. There is no need to worry about that upfront. Also post the entire bbduk.sh command line you use.

ADD REPLY • link 5.8 years ago by GenoMax 141k

0

Entering edit mode

Hi genomax,

Sorry I was not clear. I did trim with ktrim=r also before the above plots.

This is the command I used:

bbduk.sh in1=in_R1.fq in2=in_R2.fq out1=out_R1.fq out2=out_R2.fq ref=adapters.fa ktrim=r k=23 tbo tpe qtrim=r trimq=20 minlen=50

adapters are Truseq:

adapter read 1= AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

adapter read 2 = AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

ADD REPLY • link 5.8 years ago by harks • 0

1

Entering edit mode

Can you provide stats of trimming? How many reads were trimmed and how many were completely removed?

If you are sure that the inserts are definitely under 200 bp then almost every read should have adapter read through and will get trimmed. Based on the mhist plot it looks like there may be inserts that are much longer than what you are expecting.

Can you also try bbmerge.sh to see how many reads are able to merge? That should give us an idea of what the insert size actually is.

ADD REPLY • link 5.8 years ago by GenoMax 141k

0

Entering edit mode

The above command without quality trimming:

Input:                      4096286 reads       1200211798 bases.

KTrimmed:                   3240196 reads (79.10%)  371829974 bases (30.98%)

Trimmed by overlap:         3459954 reads (84.47%)  31117420 bases (2.59%)

Total Removed:              1528 reads (0.04%)  402947394 bases (33.57%)

Result:                     4094758 reads (99.96%)  797264404 bases (66.43%)

With quality trimming:

Input:                      4096286 reads       1200211798 bases.

QTrimmed:                   1062063 reads (25.93%)  50487324 bases (4.21%)

KTrimmed:                   3240196 reads (79.10%)  371829974 bases (30.98%)

Trimmed by overlap:         3459954 reads (84.47%)  31117420 bases (2.59%)

Total Removed:              102694 reads (2.51%)    453434718 bases (37.78%)

Result:                     3993592 reads (97.49%)  746777080 bases (62.22%)

ADD REPLY • link updated 5.8 years ago by GenoMax 141k • written 5.8 years ago by harks • 0

2

Entering edit mode

Your insert sizes appear to be longer than 200 bp for sure and you have low adapter dimers (so most fragments are real inserts). I am not sure how you estimated the insert sizes but the reality seems to be different. If you absolutely expect the inserts to be 150-200bp, then you may want to investigate what is going on. Otherwise at this point (after the trimming) you should just go ahead and proceed with your analysis (without any hard trimming).

ADD REPLY • link 5.8 years ago by GenoMax 141k

0

Entering edit mode

BBMerge results:

Pairs:                  2048143

Joined:                 1943104     94.872%

Ambiguous:              39577       1.932%

No Solution:            65462       3.196%

Too Short:              0           0.000%

Avg Insert:             198.4

Standard Deviation:     77.6

Mode:                   162

Insert range:           35 - 576

90th percentile:        303

75th percentile:        243

50th percentile:        187

25th percentile:        141

10th percentile:        109

ADD REPLY • link updated 5.8 years ago by GenoMax 141k • written 5.8 years ago by harks • 0

0

Entering edit mode

This is a lot clearer now. I was getting confused by sequence length distribution of individual r1 and r2 reads in the fastqc module.

Thanks for your help genomax and h.mon.

ADD REPLY • link 5.8 years ago by harks • 0

0

Entering edit mode

Great. You should be able to move forward with the analysis.

ADD REPLY • link 5.8 years ago by GenoMax 141k

score 0 · Answer 1 · 2018-07-17

Try trimming with bbduk.sh (from the package BBTools / BBMap), using the flags tbo tpe, which respectively mean trim by overlap (Trim adapters based on where paired reads overlap) and trim pairs evenly (when kmer right-trimming, trim both reads to the minimum length of either). In particular, tbo is useful for removing adapter even if the sequences are not recognized by the program.

If your reads continue to have a sharp peak of 300bp length, I would not trim them. Are you working with an organism which have an assembled genome? If a reference genome is available, you can check the quality of the reads in relation to base position within reads, using bbmap.sh with the mhist parameter:

mhist=<file>            Histogram of match, sub, del, and ins rates by 
                        read location.