Question: Modifying bam file sequence names according to 2nd column
0
gravatar for jyu429
5.5 years ago by
jyu429120
United States
jyu429120 wrote:

Hi, 

I want to modify the sequence names in my bam file. They're supposed to be for paired end, but the names don't have /1 and /2 so I can't use software like bedtools bam2fastq. Anyway, I'd like to add /1 to the end of the name if the flag in the second column is 99 or 83 and then /2 if its 163 or 147. For instance,

HSQ1008:141:D0CC8ACXX:3:2202:1520:59984 163     chr14   105899906       60      101M    =       105900110       305     CCTTTCCAGGAAAGGGAGTAGCGAGGCTGCTCACTTAGAGCCACGCACCTGGGGCTGACAGTGTGCCTGGCAGTACCTGTGTGGAAAGACAGTTACAGAGG   @C@DDFDDHHHHAHGBHG1AFHIGIJGIIGEIJIGIFE?BBGIIIIBHIEGHHHFFFEEEEEEDCCCCCDDCD?CACCDDDCBB@ACBC?CCDCCCACAC8   RG:Z:NA12877    XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:101

should become 

HSQ1008:141:D0CC8ACXX:3:2202:1520:59984/2 163     chr14   105899906       60      101M    =       105900110       305     CCTTTCCAGGAAAGGGAGTAGCGAGGCTGCTCACTTAGAGCCACGCACCTGGGGCTGACAGTGTGCCTGGCAGTACCTGTGTGGAAAGACAGTTACAGAGG   @C@DDFDDHHHHAHGBHG1AFHIGIJGIIGEIJIGIFE?BBGIIIIBHIEGHHHFFFEEEEEEDCCCCCDDCD?CACCDDDCBB@ACBC?CCDCCCACAC8   RG:Z:NA12877    XT:A:U  NM:i:0  SM:i:37 AM:i:37 X0:i:1  X1:i:0  XM:i:0  XO:i:0  XG:i:0  MD:Z:101

How should I go about this? The bam file is also missing header information. Thanks!

bam • 1.5k views
ADD COMMENTlink written 5.5 years ago by jyu429120

I think you should be able to use Picard Sam2Fastq on this bam file and still able to get a pair of fastq files as output. You don't need to add "/1" or "/2" in the bam file. The "/1" and "/2" tags are trimmed off from the paired-end reads before they are written in the bam file. However, this information can be deduced using the samtools bitwise flag (second column). As you mentioned that 99 and 83 will represent "/1" reads. Sam2Fastq also uses the same logic to assign the "_1" and "_2" tags to the reads. You will have to make sure that you create a proper header for the bam file in case it is missing one. Read about the SAM format here https://samtools.github.io/hts-specs/SAMv1.pdf

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by Ashutosh Pandey12k

you already asked a very similar question: Substitute first column based on second column

why would you need to to this ? what's your final aim ?

ADD REPLYlink written 5.5 years ago by Pierre Lindenbaum131k

I'm just trying to convert a bam to a fastq file, but the paired reads have duplicate names rather than /1 and /2. I tried using Hydra bamtofastq and this seems to work, so my question is resolved. Thanks!

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by jyu429120

Hello jyu429!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=58344

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 5.5 years ago by RamRS30k

Ah, sorry. I wasn't sure how to delete it, but it won't happen again.

ADD REPLYlink written 5.5 years ago by jyu429120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1035 users visited in the last hour