Extracting the full read ID when converting from BAM -> FASTQ
1
0
Entering edit mode
5.4 years ago
multimeric ▴ 30

I want to ensure I can convert my BAMs back to FASTQ without any loss of data. However, I have noticed that, when running samtools fastq, the reads that come out look different from the reads I originally aligned. In particular, they seem to have lost the second segment of the read ID that contains the index sequence. For example, lets say the original reads looked like this:

@EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

Once converted back, they look more like:

@EAS139:136:FC706VJ:2:2104:15343:197393
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

In addition, when I look at the BAM file, post alignment, I see that this part of the read ID isn't there either. So I have to assume BWA is stripping them out. Why would it do this? Is there any way to make it preserve all data?

bam fastq samtools bwa • 1.6k views
ADD COMMENT
0
Entering edit mode

all the aligner uses header till first space. if you want the full header you need to replace the space with something else.

ADD REPLY
0
Entering edit mode

But make sure if doesn't get too long for the specifications.

ADD REPLY
0
Entering edit mode

reformat.sh in=your.bam out1=R1.fq.gz out2=R2.fq.gz from BBMap suite, should preserve the header as is provided your alignments retain the information about R1/R2 reads.

ADD REPLY
0
Entering edit mode
5.4 years ago
Tm ★ 1.1k

You can replace space in header with "_" in both R1 and R2 reads file. Assuming second segment is same through out the reads file. You can use simple sed command for this purpose:

sed 's/ 1:Y:18:ATCACG/_1:Y:18:ATCACG/g' input_R1.fastq >output_R1.fastq
sed 's/ 2:Y:18:ATCACG/_2:Y:18:ATCACG/g' input_R2.fastq >output_R2.fastq
ADD COMMENT

Login before adding your answer.

Traffic: 1837 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6