Question: How to improve alignment quality of ChIP-seq for single-end
0
gravatar for jule
6 months ago by
jule0
jule0 wrote:

Hi,

I am struggeling with the alignment quality of my mapped ChIP-seq data. This topic in general has been discussed before, however only for paired-end and my data is single-end.

I used public ChIP-seq data from GEO (GSE55062) for H3K27ac, IgG and some others (single-end). In the corresponding paper, they do not mention any trimming, but start their ChIP-seq analysis directly with “reads were aligned to the HG19 reference genome using Bowtie2 with all default settings.” As my fastQC on the downloaded raw fastq files revealed some moderate sequence quality scores I decided to trim them before mapping (if I skip this, the alignment rates are around 1% ):

java –jar trimmomatic-0.36.jar SE –threads 4 –phred33 IgG.fastq IgG-trimmed.fastq TRAILING:25 
SLIDINGWINDOW:4:25

./bowtie2 –U IgG.fastq –x index/hg19 –p 6 –S IgG-mapped.sam

29853240 reads; of these:

29853240 (100.00%) were unpaired; of these:

17350252 (58.12%) aligned 0 times

7188392 (24.08%) aligned exactly 1 time

5314596 (17.80%) aligned >1 times

41.88% overall alignment rate

However, I only get a pretty bad overall alignment rate of 42% for IgG and 21% for H3K27ac. The rates for my other files are all around this range. Is there a way to increase the alignment rate? Did I miss anything important in my steps which leads to these bad rates? Do I have to do an additional quality improving step before aligning? They do not mention any quality control before mapping in the paper, but quote alignment rates above 50%.

Thanks for your help,

Kind regards

ADD COMMENTlink modified 6 months ago • written 6 months ago by jule0

As far as I can see from this GSE number, all the data are paired-end. E.g. the IgG you refer to is backed up at the ENA as PE, and also the entry at the NCBI indicates paired-end. How did you download the data (in SRA format I suppose), and which command did you then use for conversion to fastq?

ADD REPLYlink modified 6 months ago • written 6 months ago by ATpoint4.4k

Thank you! Before, I downloaded the SRA from the GEO database and converted them with fastq-dump from the sra-toolkit. But when I tried to split it into two paired fastq files it didnt work out and other tools I used to check if it is paired or not pretended to be single-end. So, I am very grateful for your link to the ENA, where I now downloaded the proper paired fastq files. I didn't know that GSE dataset are also available on other archives.

ADD REPLYlink written 6 months ago by jule0

Check the unmap reads (e.g. BLAST search), and you will find the reason why they have not aligned to the reference genome.
* bowtie2 "--un" option will write reads that fail to align.

ADD REPLYlink written 6 months ago by ori50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 604 users visited in the last hour