Question: Bwa Mem Method For Mapping, With Or Without Trimming Reads?
2
gravatar for J.F.Jiang
3.6 years ago by
J.F.Jiang640
China
J.F.Jiang640 wrote:

Hi all,

Recently, I am dealing with the exome-seq data to call variants using bwa+GATK+varscan method, which is commonly accept by researchers.

As pointed in GATK forum, we can now use bwa mem command to replace the former bwa aln + sampe step to directly map the reads to the ref genome.

Now I am told by two ways: 1) directly using mem without the trimming process, and they told me the mapping quality is good enough. 2) we should firstly run the fastqc to determine whether the reads quality is good or not, then use fastx or cutadapt tool to trim the reads for better alignment.

So, is there anyboby has came across such a issue that you can share me some tips?

Thanks!

bwa • 11k views
ADD COMMENTlink modified 3.5 years ago by Birdman20 • written 3.6 years ago by J.F.Jiang640
1

Trimming is generally a good idea. It makes the alignment/variation calling steps both faster and slightly more reliable.

ADD REPLYlink written 3.6 years ago by Devon Ryan70k
2

Reference? Saw a few papers go by on this, with a few recommending only light trimming. See, e.g.: http://biorxiv.org/content/biorxiv/early/2013/11/14/000422.full.pdf which is looking at transcript assembly, but recommends:

researchers interested in assembling transcriptomes de
134 novo should elect for a much more gentle quality trimming, or no trimming at all

The author of that paper goes over another study on trimming here: http://genomebio.org/is-trimming-is-beneficial-in-rna-seq/

ADD REPLYlink written 3.6 years ago by brentp22k
4

That over aggressive trimming proves deleterious isn't exactly surprising to me. The importance of trimming and how stringently one should do it is dependent on (1) the length of reads (2) the type of experiment and (3) the aligner used and its options. For example, aligners using an end-to-end alignment (e.g. bowtie2 by default) will be more susceptible to mapping errors (or producing alignments with unduly low alignment scores) due to not trimming low quality read ends or read ends containing adapter contamination. For many applications (RNAseq or SNP calling are good examples) this really won't make a big difference. I don't disagree with Matthew MacManes there and find people suggesting trimming at Phred=30 or always lopping the last 25bp of reads off to just be silly. For other applications (I'm doing a lot of bisulfite-sequencing these days), this can really screw things up (though much less so for Bison, the aligner I wrote, than Bismark). As reads get longer and we start transitioning to local alignment and downstream analyses that account for base-call quality I expect the importance of quality trimming to wain significantly.

Anyway, good to see that someone's trying to put some objective numbers to the process!

Edit: I should add that I include adapter trimming in the general step of trimming. It seems that Matthew MacManes is in favor of continued adapter trimming, though I'm sure many people take that over the top too.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Devon Ryan70k

These are interesting links. The first paper appears to only use Trimmomatic, and the second link suggests you'll get very different results depending on the tool, so I wouldn't call this conclusive. I think the quality of the data and the research question should dictate how to treat the reads and coming up with a universal trimming rule shouldn't be the goal.

Also, if you look closely (following the second link), you'll see that mappability does decrease with moderate trimming for all except FASTX and PRINSEQ. I think the others must use the Phred algorithm, which leads to a decrease in mapping percent with increasing threshold. I would expect seqtk to display the same pattern since it uses this algorithm also.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by SES7.9k
10
gravatar for hannes.svardal
3.5 years ago by
Austria
hannes.svardal120 wrote:

I just asked a similar question about quality trimming and BWA mem at the the bwa mailing list and this was Heng Li's reply:

You don’t need to do quality trimming with bwa-mem. BWA-backtrack requires reads to be mapped in full length. Low-quality tail may greatly affect its sensitivity. Bwa-mem largely does local alignment. If a tail cannot be mapped well, it will be soft clipped.

ADD COMMENTlink written 3.5 years ago by hannes.svardal120
1
gravatar for Pavel Senin
3.6 years ago by
Pavel Senin1.8k
Los Alamos, NM
Pavel Senin1.8k wrote:

I use bwa-mem, but not for SNPs though. I think that adapters must be cut off - it would be difficult to align reads which have adapter artifacts because the reference doesn't have those, moreover the shorter alignment will be scored less and the read may be invalidated due to mismatches (while it would be kept after proper QC). Another concern, however, is that bwa-mem designed for 70bp-Mbp reads (as they say) and after QC (cut and trimming) a number of reads may get too short...

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Pavel Senin1.8k

no doubt, that the primers seq should be cut off. However, what I am concerning is that after the primer cutoff, should we do the reads trimming to make a better QC score and thus for the better mapping using align tools, such as bwa-mem.

For example, using fastQC, we may find the the reads after 70pos show a poor quality depend on the reads quality scores (>=20), should we remove the reads after pos. 70 using fastx?

ADD REPLYlink written 3.6 years ago by J.F.Jiang640
1

In general, there is a rule in computer science called garbage in, garbage out, suggesting that there is no silver bullet - computers (and algorithms) only capable of producing results as good as your data is. So, in general, i would clean any ambiguous base pairs. But, in any particular case (for any particular task) the threshold of ambiguity may vary. In my opinion, if you are looking for variations, and your coverage is low, having ambiguous bases may directly lead to ambiguous results of low quality.

ADD REPLYlink written 3.6 years ago by Pavel Senin1.8k
0
gravatar for Ashutosh Pandey
3.5 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Not relevant to the question but a recent paper about "An Extensive Evaluation of Read Trimming Effects on Illumina NGS Data Analysis"

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0085024

ADD COMMENTlink written 3.5 years ago by Ashutosh Pandey11k
0
gravatar for Birdman
3.5 years ago by
Birdman20
Montreal
Birdman20 wrote:

IMHO, trimming is a good idead a long as it's not too 'aggressive'. I used Trimmomatic and BWA-mem and I obtained good results from it. If you're using paired-end data, just make sure you use a software that will keep track of those or BWA-mem will be unable to map your reads correctly (e.g. Trimmomatic does it, FastX-toolkit does NOT).

ADD COMMENTlink written 3.5 years ago by Birdman20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1456 users visited in the last hour