Question: bwa mapping after fastx-trimmer
0
gravatar for User6891
2.7 years ago by
User6891190
Europe
User6891190 wrote:

Hi everyone,

I have some Illumina MiSeq data (sequenced beginning 2015). The reads are paired-end 2x250 bp. I need to cut 100 bp of the end of each read. Therefore I used fastx-toolkit trimmer using the following code

fastx_trimmer -t 100 -Q33 -i file.fastq -o file_trimmed.fastq

although the files were sequenced quite recent, I needed to use the Q33 parameter, otherwise fastx_trimmer was not working.

Now I want to map these data using bwa-mem. I get however the following error message:

[mem_sam_pe] paired reads have different names: "M01441:126:000000000-ADL98:1:1104:10392:3758", "M01441:126:000000000-ADL98:1:1104:23060:3763"

So bwa-mem ended with an error. Does anyone know why this is happening and how to solve it?

bwa fastx-toolkit • 1.1k views
ADD COMMENTlink modified 2.7 years ago by dariober8.4k • written 2.7 years ago by User6891190
2
gravatar for Brian Bushnell
2.7 years ago by
Walnut Creek, USA
Brian Bushnell15k wrote:

Fastx cannot handle paired reads; if you use it for trimming or filtering of paired reads, your data will get corrupted.  It's possible to fix this, but the best solution is to use a different program such as BBDuk or seqtk, and process both reads simultaneously.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Brian Bushnell15k

I have seqtk installed. Would this work to trim the last 100 bases?

./seqtk trimfq -e 100 file.fastq.gz > file.trimmed.fastq

You're saying to process both reads simultaneously ... but how can I do that in seqtk?

I have used the above command on both R1 and R2 .fastq files seperately. BWA-mem now finishes successfully. However GATK is now ending with an error:

ERROR MESSAGE: SAM/BAM file SAMFileReader{file_sorted.bam} is malformed: the BAM file has a read with no stored bases (i.e. it uses '*') which is not supported in the GATK; see the --filter_bases_not_stored argument. Offender: M01441:126:000000000-ADL98:1:2104:23734:22483

 

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by User6891190

Ahh!  Sorry, my mistake.  I could have sworn that seqtk processed files as pairs, but according to the manual, I don't see any option for that.  You can do it with BBDuk, though:

bbduk.sh in1=r1.fq.gz in2=r2.fq.gz out1=trimmed1.fq.gz out2=trimmed2.fq.gz ftr2=100

By default, BBDuk will always leave a minimum of 1bp in a read to prevent the above problem.  However, you can use the flag "minlen=30" to, for example, throw away all read pairs that end up shorter than 30bp after trimming.

 

ADD REPLYlink written 2.7 years ago by Brian Bushnell15k

And can you also use it in the following situation:

I want to remove the first 4 bases & then keep the next 120?

 

ADD REPLYlink written 2.7 years ago by User6891190

Yes, in that case:

bbduk.sh in1=r1.fq.gz in2=r2.fq.gz out1=trimmed1.fq.gz out2=trimmed2.fq.gz ftl=4 ftr=123

"ftl=X" means trim the leftmost X bases; "ftr2=X" means trim the rightmost X bases; and "ftr=X" means trim all the rightmost bases after position X, 0-based.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Brian Bushnell15k
0
gravatar for Pierre Lindenbaum
2.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum102k wrote:

I'm not sure BWA likes  the fastqs when length(sequence|quality)==0 . Try to put the following awk script afetr fastx:

 

awk '{if(NR%4==2 && length($0)==0) { printf("N\n");} else if(NR%4==0 && length($0)==0) { printf("#\n");} else {print;}}'

ADD COMMENTlink written 2.7 years ago by Pierre Lindenbaum102k
0
gravatar for dariober
2.7 years ago by
dariober8.4k
Glasgow - UK
dariober8.4k wrote:

If you just want to trim off the last 100, i.e. keep the first 150 bp you could do something like this:

zcat reads.fq.gz \
| paste  - - - - \
| awk -v OFS='\n' -v FS='\t' '{print $1, substr($2, 1, 150), $3, substr($4, 1, 150)}' \
| gzip > fq1.trim.fq.gz

(NB: There is no adapter or quality trimming here)

ADD COMMENTlink written 2.7 years ago by dariober8.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 640 users visited in the last hour