How do I interpret a 200bp reduction of coverage in a gene exon?
2
0
Entering edit mode
5 weeks ago

Good morning everyone,

I hope someone can help me with this issue. In my lab, we performed a targeted sequencing with Illumina MiSeq. We sequenced one single gene (whole gene, both introns and exons) in 96 samples, belonging to two different experimental groups. After the alignment and the duplicates removal, I decided to visualize my bam files using the Golden Helix Genome Browser. I noticed a reduction of the coverage for a 200bp region in the only coding exon of my sequenced gene. This reduction occurs in all my samples but it is more evident in one group than the other (statistically determined using read depth values).

How should I interpret these data?

1
Entering edit mode

structural variance between alleles of the same gene?

1
Entering edit mode

If you have a reference set available then you can try ExomeDepth (LINK) package to see if you have a CNV. There are other packages available as well.

0
Entering edit mode

Thanks GenoMax for your answer. To my knowledge, CNV calling tools usually work on whole genome/whole exome sequencing data. I have sequenced only one gene. Therefore, I believe that CNV calling would not work properly.

1
Entering edit mode

How did you normalize the data and assess significance? Could this be a general issue due to GC bias and "sequencability" of that stretch of DNA, did you check public sequencing data, e.g. GTEx or any other human samples whether this reduction is a general phenomenon for that region?

Edit: If this is not human then any other public (e.g. WGS or exome) samples from the same species.

0
Entering edit mode

Thanks ATpoint for your response. I extracted read depth values with "samtools depth" samtools depth manual. I calculated the read depth mean for the background coverage as well as for the 200bp region coverage for each sample. First I checked if the background coverage differed between samples of the two experimental groups (t test) and they were not significantly different. Then, I subtracted the average read depth of the 200bp region to the average background read depth, for each sample (normalization), to check how the depth of loss changed between our conditions. Finally, I performed the t test between the two experimental groups, using these differences, and I got a Pval = 0.03

I though this was the best way to evaluate the loss of coverage that I noticed with the visual inspection on Genome Browse

0
Entering edit mode

using PCR to validate the 200bp deletion if the quality of these reads is not very low.

2
Entering edit mode
5 weeks ago

It is a deletion!

(seriously, it is not a reduction of coverage, but a massive drop of 50% at exactly two bases, instead of gradual reduction)

I recommend to load it to IGV and inspect soft-clipped parts of reads and paired-end distances

0
Entering edit mode

Thanks for your reply! Do you have any suggestions about the biological meaning of having a different read depth in the same deletetion between the two conditions? Have you ever handled this kind of data?

0
Entering edit mode

Yup. It can be a polymorphism. You need to check the frequency of this deletion in some large cohort (such as 1000GP). This is how it started: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4405206/ , modern resources have much more details and study larger cohorts.

1
Entering edit mode
5 weeks ago

There are several reasons why this could be happening. One of the easiest and fastest ones to check is that maybe that coding exon has an homolog region in the genome, therefore some reads may be mapping with equal probability on both sites (this assigns those reads a mapping quality of "0"). Check if reads with mapping quality = 0 are being filtered (maybe only reads with mapping quality > 1 are being displayed by default) and, if so, just remove that threshold and check if the coverage on that region changes. If it does you'll know the reason why, and if doesn't you may continue thinking about other alternatives.

0
Entering edit mode

Thanks Jorge Amigo ! I used BLAST to check whether that region has homologous along the genome and I did not find alignments other than the one on my gene.