Question: Randomly filtering reads from two sam files
gravatar for jyu429
3.2 years ago by
United States
jyu429120 wrote:


I have these two sam files hg19.sam and hg38.sam. I would like to randomly sample entries from hg19.sam (for each entry, include it with probability 0.5) and randomly sample entries from hg38.sam and merge these entries into one sam file: merged.sam. How should I best go about this? Thanks!

sam samtools • 942 views
ADD COMMENTlink modified 3.2 years ago by Alternative220 • written 3.2 years ago by jyu429120

what's the point of this ? the chromosomes length/name/sequence are not the same ....

ADD REPLYlink written 3.2 years ago by Pierre Lindenbaum116k

Later that week:
"After running 'samtools flagstat hg28.5.sam' and i'm told I have 323499083 properly mapped pairs, but when I run 'wc -l hg28.5.sam' I only have 154524561 lines. Is this safe to ignore?"

Later that year:
"In conclusion, our findings show a dramatic disregulation of the DERP1 locus in all cohorts analysed, consistent with our hypothesis that this locus is the master regulator of fragile XY-Error Syndrome. In keeping with our commitment to reproducable science, the processed BigWig files can be found in GEO, however access to the raw sequencing data requires approval from our Data Access Committee, three kinds of photo ID, a small blood sample, and a signed copy of David Hasselhoff's 'Looking For Freedom' single circa 1989."

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by John12k
gravatar for Alternative
3.2 years ago by
Alternative220 wrote:

1) "samtools view -s" allows you to subsample sam files followed by "samtools merge" to merge them.

2) Also, sambamba is very fast. Same thing, "sambamba -s" for subsampling and "sambamba merge" for merging

3) Picard tools allows you to do the same too.

You can also do the same with a simple python/perl/shell (using shuf) scripts if you want.


ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Alternative220
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2081 users visited in the last hour