Question: randomly subsampling a bam file three times
0
gravatar for GK1610
15 months ago by
GK161020
United States
GK161020 wrote:

I have a sample.bam file and I want to randomly sample 15 million reads from this file

I want to select 15 million reads 3 times

random_sample_15_million_1.bam, random_sample_15_million_2.bam random_sample_15_million_3.bam

I DONT want these 3 files to be identical.

chip-seq • 769 views
ADD COMMENTlink modified 15 months ago by karl.stamm3.2k • written 15 months ago by GK161020

Subsample BAM to fixed number of alignments

ADD REPLYlink written 15 months ago by genomax34k
0
gravatar for karl.stamm
15 months ago by
karl.stamm3.2k
United States
karl.stamm3.2k wrote:

Count the total reads, and find what proportion of the total is 15M. Then Use Picard DownsampleSAM to select just that %. https://broadinstitute.github.io/picard/command-line-overview.html

Set the random seed to 1,2,3 and you'll have unique files.

ADD COMMENTlink written 15 months ago by karl.stamm3.2k

If Picard gives you trouble, samtools view also has a downsample parameter.

ADD REPLYlink written 15 months ago by karl.stamm3.2k

in case it's not apparent ,

PROBABILITY=Double P=Double The probability of keeping any individual read, between 0 and 1. Default value: 1.0. This option can be set to 'null' to clear the default value.

use 0.25 for ~ 25% of the reads.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Kevin540
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 771 users visited in the last hour