Modifying bam file sequence names according to 2nd column
0
0
Entering edit mode
7.9 years ago
jyu429 ▴ 120

Hi,

I want to modify the sequence names in my bam file. They're supposed to be for paired end, but the names don't have /1 and /2 so I can't use software like bedtools bam2fastq. Anyway, I'd like to add /1 to the end of the name if the flag in the second column is 99 or 83 and then /2 if its 163 or 147. For instance,

HSQ1008:141:D0CC8ACXX:3:2202:1520:59984 163     chr14   105899906       60      101M    =       105900110       305     CCTTTCCAGGAAAGGGAGTAGCGAGGCTGCTCACTTAGAGCCACGCACCTGGGGCTGACAGTGTGCCTGGCAGTACCTGTGTGGAAAGACAGTTACAGAGG   @C@DDFDDHHHHAHGBHG1AFHIGIJGIIGEIJIGIFE?BBGIIIIBHIEGHHHFFFEEEEEEDCCCCCDDCD?CACCDDDCBB@ACBC?CCDCCCACAC8   RG:Z:NA12877    XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:101


should become

HSQ1008:141:D0CC8ACXX:3:2202:1520:59984/2 163     chr14   105899906       60      101M    =       105900110       305     CCTTTCCAGGAAAGGGAGTAGCGAGGCTGCTCACTTAGAGCCACGCACCTGGGGCTGACAGTGTGCCTGGCAGTACCTGTGTGGAAAGACAGTTACAGAGG   @C@DDFDDHHHHAHGBHG1AFHIGIJGIIGEIJIGIFE?BBGIIIIBHIEGHHHFFFEEEEEEDCCCCCDDCD?CACCDDDCBB@ACBC?CCDCCCACAC8   RG:Z:NA12877    XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:101


bam • 2.0k views
0
Entering edit mode

I think you should be able to use Picard Sam2Fastq on this bam file and still able to get a pair of fastq files as output. You don't need to add /1 or /2 in the bam file. The /1 and /2 tags are trimmed off from the paired-end reads before they are written in the bam file. However, this information can be deduced using the samtools bitwise flag (second column). As you mentioned that 99 and 83 will represent /1 reads. Sam2Fastq also uses the same logic to assign the _1 and _2 tags to the reads. You will have to make sure that you create a proper header for the bam file in case it is missing one. Read about the SAM format here https://samtools.github.io/hts-specs/SAMv1.pdf

0
Entering edit mode

you already asked a very similar question: Substitute first column based on second column

why would you need to to this ? what's your final aim ?

0
Entering edit mode

I'm just trying to convert a bam to a fastq file, but the paired reads have duplicate names rather than /1 and /2. I tried using Hydra bamtofastq and this seems to work, so my question is resolved. Thanks!

0
Entering edit mode

Hello jyu429!

This is typically not recommended as it runs the risk of annoying people in both communities.

0
Entering edit mode

Ah, sorry. I wasn't sure how to delete it, but it won't happen again.