How to improve alignment quality of ChIP-seq for single-end
0
0
Entering edit mode
3.3 years ago
jule • 0

Hi,

I am struggeling with the alignment quality of my mapped ChIP-seq data. This topic in general has been discussed before, however only for paired-end and my data is single-end.

I used public ChIP-seq data from GEO (GSE55062) for H3K27ac, IgG and some others (single-end). In the corresponding paper, they do not mention any trimming, but start their ChIP-seq analysis directly with “reads were aligned to the HG19 reference genome using Bowtie2 with all default settings.” As my fastQC on the downloaded raw fastq files revealed some moderate sequence quality scores I decided to trim them before mapping (if I skip this, the alignment rates are around 1% ):

java –jar trimmomatic-0.36.jar SE –threads 4 –phred33 IgG.fastq IgG-trimmed.fastq TRAILING:25
SLIDINGWINDOW:4:25

./bowtie2 –U IgG.fastq –x index/hg19 –p 6 –S IgG-mapped.sam


29853240 (100.00%) were unpaired; of these:

17350252 (58.12%) aligned 0 times

7188392 (24.08%) aligned exactly 1 time

5314596 (17.80%) aligned >1 times

41.88% overall alignment rate

However, I only get a pretty bad overall alignment rate of 42% for IgG and 21% for H3K27ac. The rates for my other files are all around this range. Is there a way to increase the alignment rate? Did I miss anything important in my steps which leads to these bad rates? Do I have to do an additional quality improving step before aligning? They do not mention any quality control before mapping in the paper, but quote alignment rates above 50%.

Kind regards

ChIP-Seq alignment bowtie2 quality single-end • 1.5k views
0
Entering edit mode

As far as I can see from this GSE number, all the data are paired-end. E.g. the IgG you refer to is backed up at the ENA as PE, and also the entry at the NCBI indicates paired-end. How did you download the data (in SRA format I suppose), and which command did you then use for conversion to fastq?

0
Entering edit mode

Thank you! Before, I downloaded the SRA from the GEO database and converted them with fastq-dump from the sra-toolkit. But when I tried to split it into two paired fastq files it didnt work out and other tools I used to check if it is paired or not pretended to be single-end. So, I am very grateful for your link to the ENA, where I now downloaded the proper paired fastq files. I didn't know that GSE dataset are also available on other archives.

0
Entering edit mode

Check the unmap reads (e.g. BLAST search), and you will find the reason why they have not aligned to the reference genome.
* bowtie2 "--un" option will write reads that fail to align.