Modifying bam file sequence names according to 2nd column
0
0
Entering edit mode
9.0 years ago
jyu429 ▴ 120

Hi,

I want to modify the sequence names in my bam file. They're supposed to be for paired end, but the names don't have /1 and /2 so I can't use software like bedtools bam2fastq. Anyway, I'd like to add /1 to the end of the name if the flag in the second column is 99 or 83 and then /2 if its 163 or 147. For instance,

HSQ1008:141:D0CC8ACXX:3:2202:1520:59984 163     chr14   105899906       60      101M    =       105900110       305     CCTTTCCAGGAAAGGGAGTAGCGAGGCTGCTCACTTAGAGCCACGCACCTGGGGCTGACAGTGTGCCTGGCAGTACCTGTGTGGAAAGACAGTTACAGAGG   @C@DDFDDHHHHAHGBHG1AFHIGIJGIIGEIJIGIFE?BBGIIIIBHIEGHHHFFFEEEEEEDCCCCCDDCD?CACCDDDCBB@ACBC?CCDCCCACAC8   RG:Z:NA12877    XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:101

should become

HSQ1008:141:D0CC8ACXX:3:2202:1520:59984/2 163     chr14   105899906       60      101M    =       105900110       305     CCTTTCCAGGAAAGGGAGTAGCGAGGCTGCTCACTTAGAGCCACGCACCTGGGGCTGACAGTGTGCCTGGCAGTACCTGTGTGGAAAGACAGTTACAGAGG   @C@DDFDDHHHHAHGBHG1AFHIGIJGIIGEIJIGIFE?BBGIIIIBHIEGHHHFFFEEEEEEDCCCCCDDCD?CACCDDDCBB@ACBC?CCDCCCACAC8   RG:Z:NA12877    XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:101

How should I go about this? The bam file is also missing header information. Thanks!

bam • 2.3k views
ADD COMMENT
0
Entering edit mode

I think you should be able to use Picard Sam2Fastq on this bam file and still able to get a pair of fastq files as output. You don't need to add /1 or /2 in the bam file. The /1 and /2 tags are trimmed off from the paired-end reads before they are written in the bam file. However, this information can be deduced using the samtools bitwise flag (second column). As you mentioned that 99 and 83 will represent /1 reads. Sam2Fastq also uses the same logic to assign the _1 and _2 tags to the reads. You will have to make sure that you create a proper header for the bam file in case it is missing one. Read about the SAM format here https://samtools.github.io/hts-specs/SAMv1.pdf

ADD REPLY
0
Entering edit mode

you already asked a very similar question: Substitute first column based on second column

why would you need to to this ? what's your final aim ?

ADD REPLY
0
Entering edit mode

I'm just trying to convert a bam to a fastq file, but the paired reads have duplicate names rather than /1 and /2. I tried using Hydra bamtofastq and this seems to work, so my question is resolved. Thanks!

ADD REPLY
0
Entering edit mode

Hello jyu429!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=58344

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Ah, sorry. I wasn't sure how to delete it, but it won't happen again.

ADD REPLY

Login before adding your answer.

Traffic: 1807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6