Question: randomly subsampling a bam file three times
0
gravatar for GK1610
2.0 years ago by
GK161030
United States
GK161030 wrote:

I have a sample.bam file and I want to randomly sample 15 million reads from this file

I want to select 15 million reads 3 times

random_sample_15_million_1.bam, random_sample_15_million_2.bam random_sample_15_million_3.bam

I DONT want these 3 files to be identical.

chip-seq • 1.4k views
ADD COMMENTlink modified 2.0 years ago by karl.stamm3.3k • written 2.0 years ago by GK161030

Subsample BAM to fixed number of alignments

ADD REPLYlink written 2.0 years ago by genomax51k
0
gravatar for karl.stamm
2.0 years ago by
karl.stamm3.3k
United States
karl.stamm3.3k wrote:

Count the total reads, and find what proportion of the total is 15M. Then Use Picard DownsampleSAM to select just that %. https://broadinstitute.github.io/picard/command-line-overview.html

Set the random seed to 1,2,3 and you'll have unique files.

ADD COMMENTlink written 2.0 years ago by karl.stamm3.3k

If Picard gives you trouble, samtools view also has a downsample parameter.

ADD REPLYlink written 2.0 years ago by karl.stamm3.3k

in case it's not apparent ,

PROBABILITY=Double P=Double The probability of keeping any individual read, between 0 and 1. Default value: 1.0. This option can be set to 'null' to clear the default value.

use 0.25 for ~ 25% of the reads.

ADD REPLYlink modified 11 months ago • written 11 months ago by Kevin590
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1016 users visited in the last hour