Question: randomly subsampling a bam file three times
0
gravatar for GK1610
2.6 years ago by
GK161040
United States
GK161040 wrote:

I have a sample.bam file and I want to randomly sample 15 million reads from this file

I want to select 15 million reads 3 times

random_sample_15_million_1.bam, random_sample_15_million_2.bam random_sample_15_million_3.bam

I DONT want these 3 files to be identical.

chip-seq • 1.9k views
ADD COMMENTlink modified 2.6 years ago by karl.stamm3.4k • written 2.6 years ago by GK161040

Subsample BAM to fixed number of alignments

ADD REPLYlink written 2.6 years ago by genomax62k
0
gravatar for karl.stamm
2.6 years ago by
karl.stamm3.4k
United States
karl.stamm3.4k wrote:

Count the total reads, and find what proportion of the total is 15M. Then Use Picard DownsampleSAM to select just that %. https://broadinstitute.github.io/picard/command-line-overview.html

Set the random seed to 1,2,3 and you'll have unique files.

ADD COMMENTlink written 2.6 years ago by karl.stamm3.4k

If Picard gives you trouble, samtools view also has a downsample parameter.

ADD REPLYlink written 2.6 years ago by karl.stamm3.4k

in case it's not apparent ,

PROBABILITY=Double P=Double The probability of keeping any individual read, between 0 and 1. Default value: 1.0. This option can be set to 'null' to clear the default value.

use 0.25 for ~ 25% of the reads.

ADD REPLYlink modified 18 months ago • written 18 months ago by Kevin600
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 859 users visited in the last hour