Differences In The Read Mapping Percentages When Using Bowtie And Gsnap
1
0
Entering edit mode
11.4 years ago
Varun Gupta ★ 1.3k

Hi

I am using CHIP-seq data from ENCODE. I made my own ref genome of some of the genes i am interested. I created the genome with 500bp upstream of every gene i am interested in(around 100 genes). So every gene has 500 bases.

The read length of chip-seq data is 36 bp. I first used bowtie2 as aligner so as to get the read counts and then gsnap. I used the default parameters for both of them.

Yet bowtie2 results in higher number of alignments/read counts then gsnap.

What could be the reason

here's the code for bowtie i used

bowtie2 -x /genomes/chip_bowtie2_index/index chip1.fastq -S chip1.fastq.sam

here's the code for gsnap

gsnap -D /GENOMES/CHIP/ -d gsnap_chip_genome -t 8 -B 5 -A sam chip1.fastq > chip1.fastq.sam

Any help would be appreciated

Hope to hear soon

Regards

V

bowtie mapping • 3.7k views
ADD COMMENT
1
Entering edit mode

You are using a rather unusual approach to your analysis. I doubt that many folks have the experience with mapping to 500bp upstream of 100 genes using bowtie2 or gsnap. Is there a reason that you are not using a more conventional approach of mapping to the genome and following that with a typical chip-seq workflow?

ADD REPLY
0
Entering edit mode

how are you counting alignments? what is the difference that you see? do you see the difference if you keep only the best mapping?

ADD REPLY
0
Entering edit mode

Hi Brent

So in both the cases sam files are produced. Using samtools view and -F 4 option, i remove the unmapped reads and then i just wrote a perl script to count number of times a particular gene occurs in the sam file. My genome is a short one with 100 genes and in 3rd column i have gene names. So by script i can count how many times a gene is occuring in my sam file and then that would be the read count for that gene. Bowtie2 reports more number of read counts then gsnap. Can you tell me what parameters can be used for both gsnap and bowtie for best mapping. I only used the default parameters for them as of now.

Regards

Varun

ADD REPLY
0
Entering edit mode
11.4 years ago

It is essential to use tools the way they were intended to work.

It is easy to forget that each of the mapping tools employ a fairly large number of heuristics and optimizations, many based on empirical observations. When you operate on artificial data such as your hundreds of exactly 500bp long genes you are running the risk of using the tool in ways that the authors have not anticipated.

I will only mention as a suggestion that I think your approach of mapping your data to a restricted genome is also very likely to not be the optimal one. A much better approach would be to align against the whole genome then filter for the 500bp regions.

ADD COMMENT

Login before adding your answer.

Traffic: 2434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6