Question: Aligner for 50bp paired end short reads
gravatar for bipin
3 months ago by
bipin0 wrote:

I am aligning 50 bp paired-end data for ATAC seq.

For aligning it to the reference genome I have tried bowtie with a larger insert size i.e. -X 1000 but the results are not optimal.

Is there any other aligner which is expected to perform better for this input dataset?

chip-seq alignment assembly • 238 views
ADD COMMENTlink modified 3 months ago by colindaven690 • written 3 months ago by bipin0

Any short-read aligner will do, the insert size range is defined by your data! You have to estimate it for example by mapping a subset of 10,000 or 100,000 read pairs on your reference and plotting the TLEN field of the output bam file. That will tell you your estimated insert size distribution, which then you use in bowtie with -I and -X.

With 50 nt reads, you have to carefully set the --score-min´,--mp, ´--rdg and --rfg paremeters because the read is very short and you might lose many of them because of too many mismatches.

Also, are you accepting 0 or 1 mismatch in the seed?

Are the reads from the same species as the reference?

ADD REPLYlink written 3 months ago by Macspider2.4k

Thanks for your reply.

Apologies I forgot to specify that I am using bowtie and not bowtie2 since I find it is recommended if the reads < 50 bp.

The options I am giving to bowtie are -k 2 -m 2 --best --strata.

The maximum insert size in my case is 600 so I was giving a slightly higher number for -X i.e. 1000.

The reads are from the same species as reference i.e. mouse genome.

ADD REPLYlink modified 3 months ago • written 3 months ago by bipin0

That is now how you should be doing it. Even if the people who made the library told you that the library size was 600 bp, it may not be the actual case. In reality, fragments tend to be smaller than you estimate them to be.

You can get actual insert sizes by using this method: C: Target fragment size versus final insert size

Edit: While there you could try the mapper from BBMap suite for your mapping needs.

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax49k

So you should try to give a -I and -X interval that ranges around 600. Try to change the scoring function, the gap and the mismatch penalty to allow more gaps / mismatches, if that is not satisfying.

You'll find how to do it on the manuals of bowtie / bowtie2 / hisat2 / tophat2 (works the same way).

ADD REPLYlink written 3 months ago by Macspider2.4k
gravatar for colindaven
3 months ago by
Hannover Medical School
colindaven690 wrote:

I would use bwa aln, not bwa mem and certainly not bowtie1 for this. Bowtie1 will not align reads with indels (ie. cannot align split reads), which is going to cause major problems.

Other good short read aligners in my view include subread but I am not certain if it is good for very short reads.

ADD COMMENTlink written 3 months ago by colindaven690
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1623 users visited in the last hour