Randomly filtering reads from two sam files
1
0
Entering edit mode
8.4 years ago
jyu429 ▴ 120

Hi,

I have these two sam files hg19.sam and hg38.sam. I would like to randomly sample entries from hg19.sam (for each entry, include it with probability 0.5) and randomly sample entries from hg38.sam and merge these entries into one sam file: merged.sam. How should I best go about this? Thanks!

sam samtools • 1.9k views
ADD COMMENT
2
Entering edit mode

what's the point of this ? the chromosomes length/name/sequence are not the same ....

ADD REPLY
2
Entering edit mode

Later that week:

After running samtools flagstat hg28.5.sam and I'm told I have 323499083 properly mapped pairs, but when I run wc -l hg28.5.sam I only have 154524561 lines. Is this safe to ignore?

Later that year:

In conclusion, our findings show a dramatic disregulation of the DERP1 locus in all cohorts analysed, consistent with our hypothesis that this locus is the master regulator of fragile XY-Error Syndrome. In keeping with our commitment to reproducable science, the processed BigWig files can be found in GEO, however access to the raw sequencing data requires approval from our Data Access Committee, three kinds of photo ID, a small blood sample, and a signed copy of David Hasselhoff's 'Looking For Freedom' single circa 1989.

ADD REPLY
0
Entering edit mode
8.4 years ago
Alternative ▴ 270
  1. samtools view -s allows you to subsample sam files followed by samtools merge to merge them.
  2. Also, sambamba is very fast. Same thing, sambamba -s for subsampling and sambamba merge for merging
  3. Picard tools allows you to do the same too.

You can also do the same with a simple python/perl/shell (using shuf) scripts if you want.

ADD COMMENT

Login before adding your answer.

Traffic: 2362 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6