Distinguishing between ambiguously mapped reads in a split bamToBed file
0
0
Entering edit mode
2.3 years ago
rubic ▴ 270

Hi,

I have an RNA-seq bam file with ambiguously mapped reads, which I'm converting to a bed file using bedTools's bamToBed with the -splitD option in order to show each mapping segment separately.

There's no argument in bamToBed that allows adding a field that would map the record in the bed file to its record in the bam file so that in the bed file I can distinguish between split segments of the same bam record and different bam records.

Here's an example read with 3 records in the bam file:

chr        start         end         id         strand     cigar
chr1       9010230  9011300 read_1 +             379S21M1D270M1I59M1D22M1I308M1D388M401S
chr1       9010230  9010522 read_1 +             1557H293M
chr1       7823080  7823376 read_1 +             87H31M3D3M2I2M1I41M2D8M3D3M1D200M1472H

So bamToBed without the -splitD argument will give the 3 records above (without the cigar field), whereas bamToBed with the -splitD argument will give:

chr        start         end         id         strand
chr1       9010230  9010250 read_1 +
chr1       9010252  9010580 read_1 +
chr1       9010582  9010911 read_1 +
chr1       9010913  9011300 read_1 +
chr1       9010230  9010522 read_1 +
chr1       7823080  7823110 read_1 +
chr1       7823114  7823159 read_1 +
chr1       7823162  7823169 read_1 +
chr1       7823173  7823175 read_1 +
chr1       7823177  7823376 read_1 +

It's not straight forward to map the split bed records to their corresponding unspilt bed records without a bam record field.

Any idea if there's a way to get such a field out of bamToBed or any other way to distinguish between the ambiguously mapped read records in the split bamToBed file?

I initially though that adding the -cigar argument to bamToBed will help because that can serve as a unique read mapping ID but bamToBed with the -splitD argument ignores the -cigar argument.

reads RNA-Seq bam bamToBed split • 1.0k views
ADD COMMENT
0
Entering edit mode

Thanks! I ran bam2bed with the --split option and it's not splitting my bam records. Perhaps the format of my bam is not what bam2bed expects?

I ran:

bam2bed --split < example.bam > example.bed

Here's what I'm getting for the example read_1 above:

chr1    7823079 7823376 read_1  5   +   2048    87H31M3D3M2I2M1I41M2D8M3D3M1D200M1472H  *   0   0   GGAGGAAGCAACGGAGACTCAGTGACCCAGACAGAAGGCCAAGTGACCCTCTCTGAAAGAACTTCCCTGACTGTGAACTGCTCCTACAAAACCACACAGTACCCTGCCCTATTCTGGTATGTCCAGTATCCCGGAGAAGCTCCACAGCTCCTCCTGAAAGCCTCAAAGGACAAGGATAAGGGAAGCAAAAAAGGTTTTGAAGCCATATATGATAAAGAAACCAGCTCCTTCCACTTGCAGAAAGTTTCAGTGCAAGAGTCAGACTCGGCCACGTACTACTGTGCTCTGAGT *   NM:i:59 ms:i:126    AS:i:123    nn:i:0  ts:A:+  tp:A:P  cm:i:6  s1:i:44 s2:i:0  de:f:0.1803 SA:Z:JH602138.1,9010230,+,379S1070M1D401S,60,9;chr1,9010230,+,1557S293M,60,0;   rl:i:64
chr1    9010229 9010522 read_1  60  +   2048    1557H293M   *   0   0   AATGCTGAGAAGCTCATTTTTGGGGCTGGGACCAGATTACAAGTCTTGTCAAGTAAGTGCTAGAAGCCAAGAAGTAATTTTTAAAAGGCATTTTCTGTAAAACCCGATTTGGTACATTCTCACAGCGCCACTACCCTTCGCAAACCTGCTCTGTAGGGGCCTCCATGCCTCCAGCTGCTGGGTCTGTGGCCACTCAGGACGCCAGTTCTCACAGTTACAACCCTCTATATAACAAATCAGTTTAATCTATAGATGAACACCATGCTGCTGACCAAAGGCTTAACCAGCAATGG   *   NM:i:0  ms:i:293    AS:i:293    nn:i:0  ts:A:+  tp:A:P  cm:i:92 s1:i:289    s2:i:0  de:f:0  SA:Z:chr1,9010230,+,379S1070M1D401S,60,9;chr1,7823080,+,87S291M6D1472S,5,59;    rl:i:64
chr1    9010229 9011300 read_1  60  +   0   379S21M1D270M1I59M1D22M1I308M1D388M401S *   0   0   GACTTACATGAAGAGGTCAAGATGCAAGGACTTGGACTGAAAATGAACTGTTCCTCAGGAATTATGTTTGTATTCTTCTTAACATTTGGAGGAAGCAACGGAGACTCAGTGACCCAGACAGAAGGCCAAGTGACCCTCTCTGAAAGAACTTCCCTGACTGTGAACTGCTCCTACAAAACCACACAGTACCCTGCCCTATTCTGGTATGTCCAGTATCCCGGAGAAGCTCCACAGCTCCTCCTGAAAGCCTCAAAGGACAAGGATAAGGGAAGCAAAAAAGGTTTTGAAGCCATATATGATAAAGAAACCAGCTCCTTCCACTTGCAGAAAGTTTCAGTGCAAGAGTCAGACTCGGCCACGTACTACTGTGCTCTGAGTTAATGCTGAGAAGCTCATTTTTGGGCTGGGACCAGATTACAAGTCTTGTCAAGTAAGTGCTAGAAGCCAAGAAGTAATTTTTAAAAGGCATTTTCTGTAAAACCCGATTTGGTACATTCTCACAGCGCCACTACCCTTCGCAAACCTGCTCTGTAGGGGCCTCCATGCCTCCAGCTGCTGGGTCTGTGGCCACTCAGGACGCCAGTTCTCACAGTTACAACCCTCTATATAACAAATCAGTTTAATCTATAGATGAACACCATGCTGCTGACCAAAGGCTTAACCAGCAATGAAAAAAAAAAAAAAGGACATGAGGGAGTCACATATGCTATGACCCAGGCAAATGATCTGGTTTCTCTTTTACATTTTTCTTTAAAAATACTGGGAAAGGTAAAGCAAGTTCAATTTCTCCAATTTGAAAACTGTTTATGAATTATGCTTCCCTCATATGCAGGGAGACCTAGATTGACTTTGAATGTTTAGTCATTTGATTTCACTGTCTCTGCCTGGAATCTATAATGGGCAAGGGAATCAAATGTAATTAAATGAGCAAATAATCTTCAATGGTCAATATTCCTACATTGTTTTATGTGATTTGCTCTGGAGTTTTTAATCTGGGCTTTATCTCTAATAGGCTCCCCTGAAGGGCAGTGAAGGTTTTTGTTAAGGTTTTTGTGTCTGAGGGATAGCAACTATCAAATGATCTGGGGCTCTGGGACCAAACTAATTATAAAGCCAGGTAAGTCTCAGATATGTGACAGTCAGGGAGAGGAGACTCTAGCTGAATAATACACAAAGTGTAGCATGTAAATTATATTTTTAAGAATAATCCAGCCCGCTAGAGACAATGCCCTCAACCCAAATGGGGTCTTCGCTCCCAGTGAGAGACAATGTCAGCCACCAATTCTTGTTTGGTCTAGTTGCTTCCAGAAAAGATAATGCTTAACTAGACCCCTTAAAATAGAGCTGAATTGTTAGGGAGGTTTTTTATTCCCTCTAATATGGTTAGAATTCTTTCCCAAGGATGAGCCAGCCTATTCCATTGGGAGAATACGTTTAAGGATAAGGGAAGCAAAAAAGGTTTTGAAGCCATATATGATAAAGAAACCAGCTCCTTCCACTTGCAGAAAGTTTCAATGCAAGAGTCAGACTCGGCCACGTACTACTGTGCTCTGAGTTAATGCTGAGAAGCTCATTTTTGGGGCTGGGACCAGATTACAAGTCTTGTCAAGTAAGTGCTAGAAGCCAAGAAGTAATTTTTAAAAGGCATTTTCTGTAAAACCCGATTTGGTACATTCTCACAGCGCCACTACCCTTCGCAAACCTGCTCTGTAGGGGCCTCCATGCCTCCAGCTGCTGGGTCTGTGGCCACTCAGGACGCCAGTTCTCACAGTTACAACCCTCTATATAACAAATCAGTTTAATCTATAGATGAACACCATGCTGCTGACCAAAGGCTTAACCAGCAATGG  *   NM:i:9  ms:i:1041   AS:i:1041   nn:i:0  ts:A:+  tp:A:P  cm:i:339    s1:i:1040   s2:i:386    de:f:0.0084 SA:Z:chr1,9010230,+,1557S293M,60,0;chr1,7823080,+,87S291M6D1472S,5,59;  rl:i:64
ADD REPLY

Login before adding your answer.

Traffic: 3211 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6