Hi
I am using CHIP-seq data from ENCODE. I made my own ref genome of some of the genes i am interested. I created the genome with 500bp upstream of every gene i am interested in(around 100 genes). So every gene has 500 bases.
The read length of chip-seq data is 36 bp. I first used bowtie2 as aligner so as to get the read counts and then gsnap. I used the default parameters for both of them.
Yet bowtie2 results in higher number of alignments/read counts then gsnap.
What could be the reason
here's the code for bowtie i used
bowtie2 -x /genomes/chip_bowtie2_index/index chip1.fastq -S chip1.fastq.sam
here's the code for gsnap
gsnap -D /GENOMES/CHIP/ -d gsnap_chip_genome -t 8 -B 5 -A sam chip1.fastq > chip1.fastq.sam
Any help would be appreciated
Hope to hear soon
Regards
V
You are using a rather unusual approach to your analysis. I doubt that many folks have the experience with mapping to 500bp upstream of 100 genes using bowtie2 or gsnap. Is there a reason that you are not using a more conventional approach of mapping to the genome and following that with a typical chip-seq workflow?
how are you counting alignments? what is the difference that you see? do you see the difference if you keep only the best mapping?
Hi Brent
So in both the cases sam files are produced. Using samtools view and -F 4 option, i remove the unmapped reads and then i just wrote a perl script to count number of times a particular gene occurs in the sam file. My genome is a short one with 100 genes and in 3rd column i have gene names. So by script i can count how many times a gene is occuring in my sam file and then that would be the read count for that gene. Bowtie2 reports more number of read counts then gsnap. Can you tell me what parameters can be used for both gsnap and bowtie for best mapping. I only used the default parameters for them as of now.
Regards
Varun