Downsampling long-read BAM files
2
0
Entering edit mode
23 days ago
eebloom ▴ 80

Im looking for the best way to downsample nanopore long-reads. Ideally I would want to downsample to a specific average read depth e.g. 5x

I have found a few methods e.g. using samtools but this seems to work for downsampling to a specific number or fraction of reads. Rasusa downsamples to a specific coverage /depth but requires fastq input and I would prefer to downsample my BAM files so I don't have to re-run my alignment on many samples again.

nanopore BAM QC ONT downsampling • 407 views
ADD COMMENT
0
Entering edit mode

You can use reformat.sh from BBMap suite. But this may not satisfy the "specific avg depth".

reformat.sh -Xmx10g in=your.bam out=sampled.bam parameter_from_below

Sampling parameters:

reads=-1                Set to a positive number to only process this many INPUT reads (or pairs), then quit.
skipreads=-1            Skip (discard) this many INPUT reads before processing the rest.
samplerate=1            Randomly output only this fraction of reads; 1 means sampling is disabled.
sampleseed=-1           Set to a positive number to use that prng seed for sampling (allowing deterministic sampling).
samplereadstarget=0     (srt) Exact number of OUTPUT reads (or pairs) desired.
samplebasestarget=0     (sbt) Exact number of OUTPUT bases desired.
ADD REPLY
0
Entering edit mode

Okay, thanks. Just double-checking this will work with long-reads?

ADD REPLY
0
Entering edit mode

Yes it should work with long read BAM's.

ADD REPLY
1
Entering edit mode
1 day ago
eebloom ▴ 80

random downsampling of bam files is now possible using rasusa v1.0.0 with the aln subcommand

ADD COMMENT
0
Entering edit mode

Is this what you used ultimately? Did this tool satisfy the "specific average depth" requirement?

You could accept (green check mark) this answer to provide closure to this thread. If @Pierre's answer was also useful then it can also be accepted.

ADD REPLY
0
0
Entering edit mode

This is not what I needed for this particular use case, as capping the coverage would lose the information on regions of copy number amplification / the variation in coverage across the genome

ADD REPLY

Login before adding your answer.

Traffic: 2245 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6