Question: Extracting the full read ID when converting from BAM -> FASTQ
0
gravatar for ttmigueltt
3 months ago by
ttmigueltt0
ttmigueltt0 wrote:

I want to ensure I can convert my BAMs back to FASTQ without any loss of data. However, I have noticed that, when running samtools fastq, the reads that come out look different from the reads I originally aligned. In particular, they seem to have lost the second segment of the read ID that contains the index sequence. For example, lets say the original reads looked like this:

@EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

Once converted back, they look more like:

@EAS139:136:FC706VJ:2:2104:15343:197393
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

In addition, when I look at the BAM file, post alignment, I see that this part of the read ID isn't there either. So I have to assume BWA is stripping them out. Why would it do this? Is there any way to make it preserve all data?

bwa samtools bam fastq • 219 views
ADD COMMENTlink modified 3 months ago by toralmanvar750 • written 3 months ago by ttmigueltt0

all the aligner uses header till first space. if you want the full header you need to replace the space with something else.

ADD REPLYlink written 3 months ago by popayekid5550

But make sure if doesn't get too long for the specifications.

ADD REPLYlink written 3 months ago by WouterDeCoster37k

reformat.sh in=your.bam out1=R1.fq.gz out2=R2.fq.gz from BBMap suite, should preserve the header as is provided your alignments retain the information about R1/R2 reads.

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax64k
0
gravatar for toralmanvar
3 months ago by
toralmanvar750
toralmanvar750 wrote:

You can replace space in header with "_" in both R1 and R2 reads file. Assuming second segment is same through out the reads file. You can use simple sed command for this purpose:

sed 's/ 1:Y:18:ATCACG/_1:Y:18:ATCACG/g' input_R1.fastq >output_R1.fastq
sed 's/ 2:Y:18:ATCACG/_2:Y:18:ATCACG/g' input_R2.fastq >output_R2.fastq
ADD COMMENTlink modified 3 months ago • written 3 months ago by toralmanvar750
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1733 users visited in the last hour