Question

Upstream (5′) Coverage Depletion Observed in IGV

0

Entering edit mode

3 hours ago

safeassli ▴ 10

Hello, I’m hoping for some guidance. I’m working with ONT reads and want to randomly select 7,000 reads from a fastq.gz, keeping only those with Phred > 15. I use Chopper for quality filtering, a fixed random.seed for reproducibility, and reservoir sampling to draw the 7000 reads.

After mapping this subset to a reference with minimap2 and viewing the alignments in IGV, I observe what looks like a “funnel” pattern in the pileup (pic 1). To address this, I added a minimum-coverage filter for read acceptance and set it to 70; that produced pic 2, but some reads still don’t align end-to-end. Increasing mincov to 90 gave pic 3.

Because many reads pass Phred > 15, I sometimes choose 7000–12000 reads to keep runtime reasonable. However, when the resulting alignments don’t fully cover the reference, I worry I’m losing information. Is this expected?

Basecalling was performed for all Nanopore runs with Guppy version 3.2.9 or 4.3.4

In my scriprt I dont trim reads.

I’m puzzled because, using EPI2ME with only Phred > 15 and reads = 7000, the BAM it produces does not show the same profile in IGV; specifically, I do not see the left-flank (5′) coverage drop-off there.

PS: My ultimate goal is variant calling.

This might be a basic question, but I’m a bit lost. Any insights would be appreciated. I’m hoping for some guidance.

pic1: enter image description here

pic2 enter image description here

pic3 enter image description here

igv ont • 117 views

ADD COMMENT • link updated 27 minutes ago by GenoMax 154k • written 3 hours ago by safeassli ▴ 10

0

Entering edit mode

Minimum-coverage logic:

For each read, I iterate over minimap2 hits and accept the read if at least one hit has coverage_read >= min_coverage, where:

aligner = mp.Aligner(reference_genome, preset="map-ont")

coverage_read_percent = (hit.q_en - hit.q_st) / len(read) * 100

hit.q_st, hit.q_en: start/end of the aligned segment on the read (query coordinates) len(read): total read length from the FASTQ

ADD REPLY • link 3 hours ago by safeassli ▴ 10

0

Entering edit mode

What locus are we even looking at?

ADD REPLY • link 3 hours ago by ATpoint 89k

0

Entering edit mode

thank you for replying it's cytochrome b gene

ADD REPLY • link 2 hours ago by safeassli ▴ 10

0

Entering edit mode

I’m puzzled because, using EPI2ME with only Phred > 15 and reads = 7000, the BAM it produces does not show the same profile in IGV; specifically, I do not see the left-flank (5′) coverage drop-off there.

With the same 7000 reads you select that you also manually align with minimap2? Are you using default parameters for alignment? This is a multi-copy gene?

ADD REPLY • link 2 hours ago by GenoMax 154k

0

Entering edit mode

yes , I’m using minimap2 with the default -x map-ont preset (no extra flags beyond read-group), then filtering with samtools view -F 2304 to drop secondary and supplementary alignments. Then I sort and index

ADD REPLY • link 2 hours ago by safeassli ▴ 10

0

Entering edit mode

It looks like ONT is using sligntly different pre-sets for their EPI2ME protocols. See the post explaining the change here --> https://epi2me.nanoporetech.com/how-to-align/ See if you are able to get similar results if you switch.

Is this data from whole genome sequencing or are you using selective sequencing to just get cyt b gene. Are you using the complete genome as reference?

ADD REPLY • link 27 minutes ago by GenoMax 154k