Question: Extracting the full read ID when converting from BAM -> FASTQ
0
gravatar for ttmigueltt
5 weeks ago by
ttmigueltt0
ttmigueltt0 wrote:

I want to ensure I can convert my BAMs back to FASTQ without any loss of data. However, I have noticed that, when running samtools fastq, the reads that come out look different from the reads I originally aligned. In particular, they seem to have lost the second segment of the read ID that contains the index sequence. For example, lets say the original reads looked like this:

@EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

Once converted back, they look more like:

@EAS139:136:FC706VJ:2:2104:15343:197393
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

In addition, when I look at the BAM file, post alignment, I see that this part of the read ID isn't there either. So I have to assume BWA is stripping them out. Why would it do this? Is there any way to make it preserve all data?

bwa samtools bam fastq • 167 views
ADD COMMENTlink modified 5 weeks ago by toralmanvar730 • written 5 weeks ago by ttmigueltt0

all the aligner uses header till first space. if you want the full header you need to replace the space with something else.

ADD REPLYlink written 5 weeks ago by popayekid5550

But make sure if doesn't get too long for the specifications.

ADD REPLYlink written 5 weeks ago by WouterDeCoster35k

reformat.sh in=your.bam out1=R1.fq.gz out2=R2.fq.gz from BBMap suite, should preserve the header as is provided your alignments retain the information about R1/R2 reads.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax60k
0
gravatar for toralmanvar
5 weeks ago by
toralmanvar730
toralmanvar730 wrote:

You can replace space in header with "_" in both R1 and R2 reads file. Assuming second segment is same through out the reads file. You can use simple sed command for this purpose:

sed 's/ 1:Y:18:ATCACG/_1:Y:18:ATCACG/g' input_R1.fastq >output_R1.fastq
sed 's/ 2:Y:18:ATCACG/_2:Y:18:ATCACG/g' input_R2.fastq >output_R2.fastq
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by toralmanvar730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1176 users visited in the last hour