Dealing with spike-in reads that overlap between two genomes
Entering edit mode
2.3 years ago

I recently did a histone ChIPseq experiment. I study Drosophila and used Arabidopsis chromatin as spike-in. After trimming the raw reads and aligning with bwa (reads were 150 PE), for some of my input samples, I got that 60% of my reads aligned to Drosophila and 50% aligned to Arabidopsis. Do you guys know a way of extracting the reads that aligned uniquely to each of the genomes so I get rid of the reads that overlap?


ChIP-Seq spike-in drosophila arabidopsis • 515 views
Entering edit mode
2.3 years ago
colin.kern ▴ 1000

You could make a combined genome where you prefix or suffix all the chromosome names with the species and put them into a single fasta file, then when you do the alignment reads which map to both species will be multi-mapped reads and can be removed with a quality filter.

Another method would be to extract the read names of the alignments from the bam file of the alignments for each species, then you can use some tools (such as the sort and uniq bash commands) to identify read names that are unique to one of the bam files.

I'm curious why you want to do this, though. You don't want to analyze regions of the genome that are conserved between the species in your downstream analysis?

Entering edit mode

no, the plant chromatin is used for signal normalization, it is supposed to account for IP technical variation between samples. I pretty much add same amount of plant chromatin to each one of my "real" Drosophila samples. Currently I am doing the genome combination and filtering, we'll see what happens, Thanks!!


Login before adding your answer.

Traffic: 1802 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6