Question

ChIP Seq poor mapping quality and strange fast QC result

0

Entering edit mode

5.9 years ago

e0223167 ▴ 10

Hi,

I have performed H3K27ac ChIP seq for cell line samples and performed sequence alignment using BWA-MEM. I find that the mapping quality is poor (only ~10% mapped reads) and the per base sequence content from fastQC also look very strange. However, for input samples (i.e. no antibody pulldown), the mapping quality and per base sequence content look normal.

Could anyone offer any advice, from experience or otherwise?

Thank you so much in advance!

H3K27ac fastQC

Input fastQC

ChIP-Seq alignment fastqc • 2.1k views

ADD COMMENT • link 5.9 years ago by e0223167 ▴ 10

0

Entering edit mode

I can not see the fastqc reports. From what you describe, it could be adaptor contamination (doesn't map to the genome, and always the same sequence, skewing up the per base sequence content).

ADD REPLY • link 5.9 years ago by Carlo Yague 8.6k

1

Entering edit mode

Hi, you may right click on the image icon and "Open image in new tab". This worked for me.

Thank you for your suggestion. From the per base sequence content file, I think it does not look like adaptor contamination. In fact I also tried trimming the ends away (thinking it might be adaptor contamination), but it did not result in better mapping...

ADD REPLY • link 5.9 years ago by e0223167 ▴ 10

0

Entering edit mode

Hi, you may right click on the image icon and "Open image in new tab". This worked for me.

Yes, it works now, thank you. Even from the input, the 9 first nucleotide are only G-T, which is weird. Did you use some kind of custom adapter for your library preparation ? Did you use the standard Illumina ChIP-seq protocol ? For the input only, do you also have poor mapping rate after trimming of theses 9 first nucleotides ?

ADD REPLY • link 5.9 years ago by Carlo Yague 8.6k

0

Entering edit mode

I used the standard Illumina universal adaptor for library preparation.

Yes, for the input, the mapping rate is good (>90%) after trimming of these 9 first nucleotides.

ADD REPLY • link 5.9 years ago by e0223167 ▴ 10

0

Entering edit mode

Ok, that makes sense.

For the IP data, it seems that this G-T bias goes further into the reads. I don't know why though, so you might need to contact the guys responsible for the sequencing.

Meanwhile, you can try this: 1- trim the 9 first nucleotides of your IP as you did for your input. Is it still 10% of mapping rate ? 2- remap the reads with bwa-mem with lower the clipping penalty (-L option). default is 5 but you can try 0 or 1. Does it improves the mapping rate ?

ADD REPLY • link 5.9 years ago by Carlo Yague 8.6k

0

Entering edit mode

Thank you for your suggestions, Carlo.

I tried both methods but failed to improve the mapping rate. Unfortunately I do not know what those "unmapped" reads are - is there any way I can further investigate this?

I am new to bioformatics so really appreciate all the advice.

ADD REPLY • link 5.9 years ago by e0223167 ▴ 10

0

Entering edit mode

It won't really solve your problem, but you can try to use samtools view -F 4 and samtools view -f 4 to respectively extract the mapped and unmapped reads from your bam file. Then you can run FASTQC on those unmapped.bam and mapped.bam files. It might allow to better characterize the unmapped reads.

ADD REPLY • link 5.9 years ago by Carlo Yague 8.6k