Question: NGS trimming procedure - help required
gravatar for mxs
3.5 years ago by
mxs530 wrote:


some help required regarding my ngs data.

Description of the problem:

when I received some reads from illumina, first what i did is a quality check. This turned out to be terrible. Given 150-mers had a quality above 20 only in the first 80 positions. For everything else the quality dropped significantly. So then I started trimming these reads using bbduk (great tool never failed me before) and as a result I got that only 10% of reads passed the trimming process. I repeated the same process using trimmomatic and trimgalore and results were roughly the same. After googling for a while and reading illumina manuals i concluded that this is due to either high adaptor contamination associated with dimer sequencing or very short insert size. In all of the above cases I used the default adaptor list as provided by the above tools. Then I checked the lib prep manual to see if the stated adapters were in those default lists. The first one was, this was TruSeq_Universal_Adapter, however the second one was not. I never messed with adapters since there was never any need for me to do so (plus I haven't done a lot of mapping). However, the second one was very simmilar to TruSeq_Adapter_Index_23 (GATCGGAAGAGCACACGTCTGAACTCCAGTCACGAGTGGATATCTCGTATGCCGTCTTCTGCTTG) but it had one insertion one deletion and one mutation : GATCGGAAGAGCACACGTCTGAACTCCAGTCACGaGTA-GtATCTCGTATGCCGTCTTCTGCTTG.

When I looked into my sequences a bit further I noticed that my fastq sequences started with PhiX_read2_adapter and another one which is not on the list. Totally confused at this point I decided to turn to you for help.

Additional information:

I have two index oligonucleotides .


Can anyone shed some light on this case. I received this data with very little info on it and was assigned a task to try to map it . I suspect there was some mixup with adapters but this is purely my inexperienced hunch. And if this is the likely case how do I go about creating a set of adapters myself? Also would you recommend trimming extra 15-20 NT from both ends so that better quality reads are obtained?

thank you so much


sequencing trimming ngs • 1.9k views
ADD COMMENTlink written 3.5 years ago by mxs530

Do yourself a favor and use from BBMap suite with your original dataset. There is an adapters.fa file included in the resources directory that includes all commonly used adapters so you do not need to worry about rolling your own. This thread has simple instructions on how use the program. Come back here and post if you run into trouble.

ADD REPLYlink written 3.5 years ago by genomax89k

Thnx for your quick reply. I did this and got ~9% of reads surviving, which is less than 1.5 mil out of 15. in all samples.

ADD REPLYlink written 3.5 years ago by mxs530

Do the test @h.mon suggests below but if only 9% reads are surviving this must be a really bad library with most of your data being adapter dimers or very short inserts. Hopefully the data that remained is what you expect from your sample (take a few reads and blast them at NCBI to check) and not phiX spike.

ADD REPLYlink written 3.5 years ago by genomax89k

After quality and adapter trimming, use bbduk to check for phiX contamination on your data, using the whole phiX genome: -Xmx1g in1=r1.fq in2=r2.fq ref=phix.fa k=31 hdist=1 stats=stats.txt

ADD REPLYlink written 3.5 years ago by h.mon31k

I did the entire pipeline,: adapter quality and phiX and now I have 2.54% out of the initial 15 mil... ok so this is as i expected a complete no go .... (I agree with all of you, messing with adapters is not the way to go. I do apologize for waisting your time, but i simple needed to see if maybe someone came across something so bad and absolutely needed to produce a result in order to win the Nobel price for medicine) this is simple a bad dataset and noone was there to stop it during the initial stages of the experiment so now I got stuck with it to figure out something.

Thank you all! all your input was more than helpful

ADD REPLYlink written 3.5 years ago by mxs530

Don't be disheartened. Stuff happens. Especially when someone is newly learning how to make libraries. At least you have an explanation that you can provide to them as to what they need to fix in future.

While you are working on this data make sure the surviving reads are what was expected. Don't want them to not be what you need.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by genomax89k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 734 users visited in the last hour