Question

bwa mapping after fastx-trimmer

0

Entering edit mode

9.1 years ago

User6891 ▴ 330

Hi everyone,

I have some Illumina MiSeq data (sequenced beginning 2015). The reads are paired-end 2x250 bp. I need to cut 100 bp of the end of each read. Therefore I used fastx-toolkit trimmer using the following code

fastx_trimmer -t 100 -Q33 -i file.fastq -o file_trimmed.fastq

Although the files were sequenced quite recent, I needed to use the Q33 parameter, otherwise fastx_trimmer was not working.

Now I want to map these data using bwa-mem. I get however the following error message:

[mem_sam_pe] paired reads have different names: "M01441:126:000000000-ADL98:1:1104:10392:3758", "M01441:126:000000000-ADL98:1:1104:23060:3763"

So bwa-mem ended with an error. Does anyone know why this is happening and how to solve it?

fastx-toolkit bwa • 3.6k views

ADD COMMENT • link updated 22 months ago by Ram 43k • written 9.1 years ago by User6891 ▴ 330

0

Entering edit mode

9.1 years ago

Pierre Lindenbaum 161k

I'm not sure BWA likes the fastqs when length(sequence|quality)==0. Try to put the following awk script after fastx:

awk '{if(NR%4==2 && length($0)==0) { printf("N\n");} else if(NR%4==0 && length($0)==0) { printf("#\n");} else {print;}}'

ADD COMMENT • link updated 22 months ago by Ram 43k • written 9.1 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

9.1 years ago

dariober 14k

If you just want to trim off the last 100, i.e. keep the first 150 bp you could do something like this:

zcat reads.fq.gz \
| paste  - - - - \
| awk -v OFS='\n' -v FS='\t' '{print $1, substr($2, 1, 150), $3, substr($4, 1, 150)}' \
| gzip > fq1.trim.fq.gz

(NB: There is no adapter or quality trimming here)

ADD COMMENT • link updated 22 months ago by Ram 43k • written 9.1 years ago by dariober 14k

Ram · Accepted Answer · 2015-04-07

2

Entering edit mode

9.1 years ago

Brian Bushnell 20k

Fastx cannot handle paired reads; if you use it for trimming or filtering of paired reads, your data will get corrupted. It's possible to fix this, but the best solution is to use a different program such as BBDuk or seqtk, and process both reads simultaneously.

ADD COMMENT • link updated 22 months ago by Ram 43k • written 9.1 years ago by Brian Bushnell 20k

0

Entering edit mode

I have seqtk installed. Would this work to trim the last 100 bases?

./seqtk trimfq -e 100 file.fastq.gz > file.trimmed.fastq

You're saying to process both reads simultaneously ... but how can I do that in seqtk?

I have used the above command on both R1 and R2 .fastq files seperately. BWA-mem now finishes successfully. However GATK is now ending with an error:

ERROR MESSAGE: SAM/BAM file SAMFileReader{file_sorted.bam} is malformed: the BAM file has a read with no stored bases (i.e. it uses '*') which is not supported in the GATK; see the --filter_bases_not_stored argument. Offender: M01441:126:000000000-ADL98:1:2104:23734:22483

ADD REPLY • link updated 22 months ago by Ram 43k • written 9.1 years ago by User6891 ▴ 330

0

Entering edit mode

Ahh! Sorry, my mistake. I could have sworn that seqtk processed files as pairs, but according to the manual, I don't see any option for that. You can do it with BBDuk, though:

bbduk.sh in1=r1.fq.gz in2=r2.fq.gz out1=trimmed1.fq.gz out2=trimmed2.fq.gz ftr2=100

By default, BBDuk will always leave a minimum of 1bp in a read to prevent the above problem. However, you can use the flag minlen=30 to, for example, throw away all read pairs that end up shorter than 30bp after trimming.

ADD REPLY • link updated 22 months ago by Ram 43k • written 9.1 years ago by Brian Bushnell 20k

0

Entering edit mode

And can you also use it in the following situation:

I want to remove the first 4 bases & then keep the next 120?

ADD REPLY • link updated 22 months ago by Ram 43k • written 9.1 years ago by User6891 ▴ 330

0

Entering edit mode

Yes, in that case:

bbduk.sh in1=r1.fq.gz in2=r2.fq.gz out1=trimmed1.fq.gz out2=trimmed2.fq.gz ftl=4 ftr=123

ftl=X means trim the leftmost X bases; ftr2=X means trim the rightmost X bases; and ftr=X means trim all the rightmost bases after position X, 0-based.

ADD REPLY • link updated 22 months ago by Ram 43k • written 9.1 years ago by Brian Bushnell 20k