Question: Down sampling BAM files for CNV detection
1
gravatar for Nandini
5 months ago by
Nandini760
London
Nandini760 wrote:

Hello,

I have a CNV pipeline using DeCON(variation of ExomeDepth) for clinical analysis. We now need to validate it and one of the steps is downsampling.

Just to make sure, what is the best approach/tool for downsampling BAM files (2 runs NextSeq x 48 samples each) and then running the CNV tool to check at what % @ 20x do the confirmed CNVs (by MLPA) are being picked up or dropped out ?

Many thanks,

cnv dowsampling bam ngs • 357 views
ADD COMMENTlink modified 5 months ago by WouterDeCoster36k • written 5 months ago by Nandini760
3
gravatar for WouterDeCoster
5 months ago by
Belgium
WouterDeCoster36k wrote:

I would suggest samtools -s 0.1 yourfile.bam -o newfile.bam for downsampling to 0.1x and adapt accordingly.

ADD COMMENTlink modified 5 months ago • written 5 months ago by WouterDeCoster36k
2

-s FLOAT Output only a proportion of the input alignments. This subsampling acts in the same way on all of the alignment records in the same template or read pair, so it never keeps a read but not its mate. The integer and fractional parts of the -s INT.FRAC option are used separately: the part after the decimal point sets the fraction of templates/pairs to be kept, while the integer part is used as a seed that influences which subset of reads is kept. When subsampling data that has previously been subsampled, be sure to use a different seed value from those used previously; otherwise more reads will be retained than expected.

ADD REPLYlink modified 5 months ago • written 5 months ago by Garan550

Keeping mate-pairs intact - not really an issue for ExomeDepth since it's a coverage based CNV caller, but might be needed if you're comparing to other CNV callers.

ADD REPLYlink modified 5 months ago • written 5 months ago by Garan550

so does this means "S" sets the seed of 0 and only 10% of the reads by samtool ? I was going to to use sambamba for downsampling but I guess they are both similar - PE mates are lost, I think on downsampling

ADD REPLYlink modified 5 months ago • written 5 months ago by Nandini760
1

From what the samtools documentation says I think it does keep the mate-pair (just didn't read it first time!) - "This subsampling acts in the same way on all of the alignment records in the same template or read pair, so it never keeps a read but not its mate".

For Samtools view -s 42.2, should be a subsampling seed of 42 and 20% of the reads

For Sambamba -s, --subsample=FRACTION subsample reads (read pairs) --subsampling-seed=SEED set seed for subsampling

ADD REPLYlink written 5 months ago by Garan550

Thanks Both. Another quick question, is there a way to select the seed or is it totally random ?

ADD REPLYlink written 5 months ago by Nandini760

The first part is the seed

samtools -s 8.1 does downsampling to 10% with seed "8"

ADD REPLYlink written 5 months ago by WouterDeCoster36k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1183 users visited in the last hour