Question: Low mapping rate
1
gravatar for hlsz.laszlo
4.0 years ago by
hlsz.laszlo20
Hungary
hlsz.laszlo20 wrote:

Dear all,

Recently, I obtained  several ChIP-seq data from Saccharomyces Cerevisiae.

After the Illumina sequencing, each fastq contains around ~20 million 50 bp reads. I aligned the reads either with BWA MEM or Bowtie2 to the sacCer3 genome with a very low mapping rate (20% mapped, 80 % unmapped).

I can't figure it out what can cause the unmappability of the reads. Even the input DNA does not align to the genome (50%). I tried to switch genomes but i got always the same overall mapping rate.

What can possibly happened?

Kind Regards,

Laszlo

 

 

mapping chip-seq • 4.2k views
ADD COMMENTlink modified 4.0 years ago by mxs530 • written 4.0 years ago by hlsz.laszlo20
2

Hi Laszlo, did you try to take the unmapped reads and blast them? Look whether it's a high level of contamination or if they map to cerevisiae then you might have to tweak the parameters. 

ADD REPLYlink written 4.0 years ago by marina.v.yurieva470

May be you need to clean your data ?

ADD REPLYlink written 4.0 years ago by geek_y9.1k

Try blasting a few of the unmapped reads. Perhaps you got the wrong samples back or your samples had a high level of contamination by another species.

ADD REPLYlink written 4.0 years ago by Devon Ryan88k

Thank you for the answers.

Tha data is clean from TrueSeq adaptors. Firstly, I run fastqc to check the quality and everything was ok.

I used the default parameters of the aligners.

I tried to align reads to human, mouse or e.coli genome, but the alignment rate was under 1%.

I will try to blast the unmapped reads to find the source.

Thanks again for the answers. Ill update this thread with the blast results.

ADD REPLYlink written 4.0 years ago by hlsz.laszlo20
2
gravatar for mxs
4.0 years ago by
mxs530
mxs530 wrote:

Hi,

to me this looks like a classic mappability problem caused by mapping the reads to repetitive regions. For example if you are trying to map 30-mers to human genome then approx. 25% of the genome will be unmappable if only unique positions are mapped (check the bowtie parameters). What I usually do as one of the first steps is to create a mappability tract (GEM-mappability tool) for the reference species. Then map reads, followed by creating a track of mapped reads and uploading it to the one of the browsers (UCSC or ensembl). The two will give me the information about which regions are mappable and which ones are not and where the mapped reads align to.

Unfortunately UCSC does not contain the mappability info-track for S. cer. so you will need to make one yourself.

Cheers

 

mxs

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by mxs530
1

Cerevisiae doesn't have that many repetitive regions. Even if you do mapping ignoring them it would be 80% mapped 20% unmapped, not the other way around.

ADD REPLYlink written 4.0 years ago by marina.v.yurieva470
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2464 users visited in the last hour