Collapsing BAM based on seq and positions
0
0
Entering edit mode
22 months ago
manekineko ▴ 150

Is there a tool I can use to input a BAM and do a collapsing based on sequence and positions and retaining the copy number of the unique sequence? (outputing again BAM)

bam collapse • 612 views
1
Entering edit mode

Not clear. input/output example required.

1
Entering edit mode

Is this not what mark duplicates from picard does?

Samtools + Picard Markduplicates

Picard Mark Duplicates

There are many more post about this but then you may know what to look for and if this is an option.

0
Entering edit mode

I need something like FASTA collapsing, where you retain only 1 unique sequence with its copy number, but on a level of BAM file. I have BAM mapped with uncollapsed sequences, and need to collapsed it somehow to have BAM with unique sequences mapped somewhere and its copy number (retained in the name or similar way).

0
Entering edit mode

input/output example required.

0
Entering edit mode

For example, if containing identical seq mapping the same pos:

HISEQ:47:C6FUWANXX:1:1101:1212:2073     4       contig1222  1       25      22M     *       0       0       TATTGCACTTGTCCCGGCCTGT  BBBBBFFFFFFFFFFFFFFFFF  XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:21C0
HISEQ:47:C6FUWANXX:1:1101:1139:2100     4       contig1222  1       25      22M     *       0       0       TATTGCACTTGTCCCGGCCTGT  BBBBBFFFFFFFFFFFF/FFFF  XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:21C0


to retain only 1 indicating on the name (or similar way) _2 the copy number:

HISEQ:47:C6FUWANXX:1:1101:1212:2073_2     4       contig1222  1       25      22M     *       0       0       TATTGCACTTGTCCCGGCCTGT  BBBBBFFFFFFFFFFFFFFFFF  XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:21C0

0
Entering edit mode

0
Entering edit mode

I just made an example some flags may be wrong, view it as mapped. I hope you got what I mean and want to do?

0
Entering edit mode

This sounds much like the ReducedReads format from early GATK versions. Ultimately it was retired because it wasn't sufficient to capture all the important information, but it may still be available if you can find an old enough GATK (2.8?).