Question

Differences In The Read Mapping Percentages When Using Bowtie And Gsnap

0

Entering edit mode

12.5 years ago

Varun Gupta ★ 1.3k

Hi

I am using CHIP-seq data from ENCODE. I made my own ref genome of some of the genes i am interested. I created the genome with 500bp upstream of every gene i am interested in(around 100 genes). So every gene has 500 bases.

The read length of chip-seq data is 36 bp. I first used bowtie2 as aligner so as to get the read counts and then gsnap. I used the default parameters for both of them.

Yet bowtie2 results in higher number of alignments/read counts then gsnap.

What could be the reason

here's the code for bowtie i used

bowtie2 -x /genomes/chip_bowtie2_index/index chip1.fastq -S chip1.fastq.sam

here's the code for gsnap

gsnap -D /GENOMES/CHIP/ -d gsnap_chip_genome -t 8 -B 5 -A sam chip1.fastq > chip1.fastq.sam

Any help would be appreciated

Hope to hear soon

Regards

V

bowtie mapping • 4.0k views

ADD COMMENT • link updated 12.5 years ago by Istvan Albert 102k • written 12.5 years ago by Varun Gupta ★ 1.3k

1

Entering edit mode

You are using a rather unusual approach to your analysis. I doubt that many folks have the experience with mapping to 500bp upstream of 100 genes using bowtie2 or gsnap. Is there a reason that you are not using a more conventional approach of mapping to the genome and following that with a typical chip-seq workflow?

ADD REPLY • link 12.5 years ago by Sean Davis 27k

0

Entering edit mode

how are you counting alignments? what is the difference that you see? do you see the difference if you keep only the best mapping?

ADD REPLY • link 12.5 years ago by brentp 24k

0

Entering edit mode

Hi Brent

So in both the cases sam files are produced. Using samtools view and -F 4 option, i remove the unmapped reads and then i just wrote a perl script to count number of times a particular gene occurs in the sam file. My genome is a short one with 100 genes and in 3rd column i have gene names. So by script i can count how many times a gene is occuring in my sam file and then that would be the read count for that gene. Bowtie2 reports more number of read counts then gsnap. Can you tell me what parameters can be used for both gsnap and bowtie for best mapping. I only used the default parameters for them as of now.

Regards

Varun

ADD REPLY • link 12.5 years ago by Varun Gupta ★ 1.3k

score 0 · Answer 1 · 2013-01-12

It is essential to use tools the way they were intended to work.

It is easy to forget that each of the mapping tools employ a fairly large number of heuristics and optimizations, many based on empirical observations. When you operate on artificial data such as your hundreds of exactly 500bp long genes you are running the risk of using the tool in ways that the authors have not anticipated.

I will only mention as a suggestion that I think your approach of mapping your data to a restricted genome is also very likely to not be the optimal one. A much better approach would be to align against the whole genome then filter for the 500bp regions.