bbduk.sh trimming to BAM output file
2
0
Entering edit mode
5 months ago
bge • 0

Hi,

I want to use bbduk.sh to trim reads in a uBAM file and write the trimmed reads to a new uBAM file. It appears that the reads in the output file are not trimmed.

One set of bbduk.sh parameters that I tried is

literal=polyA k=12 mink=11 ktrim=r trimclip=t

I see trimmed reads when the output file format is fastq but not when BAM or SAM.

Perhaps I am missing something?

Thank you!

Brent

bbduk.sh • 1.0k views
ADD COMMENT
1
Entering edit mode
5 months ago
GenoMax 153k

Using a uBAM input and writing uBAM output is indeed not working directly. Probably because it is an edge case that @Brian likely did not test/code for.

Following seems to work for me for a single-end read file. Convert uBAM to fastq | trim using bbduk | write the result out as uBAM.

$ reformat.sh -Xmx3g in=test.bam out=stdout.fq int=f | bbduk.sh -Xmx3g in=stdin.fq out=stdout.fq int=f forcetrimleft=10 | reformat.sh -Xmx3g in=stdin.fq out=trimmed.bam int=f
ADD COMMENT
0
Entering edit mode

Hi,

I appreciate the confirmation -- I worry that I mess up.

Anyway, the input uBAM file has a tag with the barcode+umi sequences required by STARsolo. I don't see a straightforward way to preserve this information with a conversion to fastq.

Do you know whether this program is maintained?

Thank you!

Brent

ADD REPLY
0
Entering edit mode

BBMap is actively maintained but what you have is an edge case. You can try writing to Brian Bushnell (his email can be found in software in-line help).

Why do you need to trim the data? STARsolo may be able to handle it as is.

ADD REPLY
1
Entering edit mode
5 months ago

Hi Brent, at one time, and to some extent, BBDuk did trimming of sam/bam files. However, it's a very fiddly process because the cigar strings and MDTags have to be regenerated, and some other tags may end up becoming incorrect as a result of the trimming operation, so I don't really recommend it (although for ubam it wouldn't matter). Generically, one can put metadata in a fastq header and then move it back to a bam field later.

What's happening here is that BBDuk is trimming the read successfully, but then outputting the original untrimmed SamLine anyway instead of regenerating it (if you output as fastq you'll see that the reads actually got trimmed in that case), which is a bug. This is easy to fix in the case of ubam, and I will fix it in my next release. Whether the change will work universally for mapped bam is less likely; I'll probably prevent that.

ADD COMMENT
0
Entering edit mode

HI Brian, this is off-topic a bit, but since you are the expert!

I have some human blood samples that contain Naegleria Fowleri Karachi NF001 strain that I'm having some difficulty extracting from my reads. This is a new strain only sequenced a few years ago, which is significantly different from the other published strains. It was difficult just figuring out what it was, as commercial platforms were identifying as completely different species ( Plasmodium Ovale, Mycobacterium Leprae, Mycobacterium Tuberculosis Oman strain, and probably a few more), obviously the main issue is that nobody has this strain in their database. The main issue is that it's been reported that Naegleria Fowleri is about 30% similar to human, though mostly not an exact match, so standard human host removal wipes out most of the reads. I'm getting closer, by just doing perfect matches to the reference which I was using Bowtie2, but then I noticed that Read1 may be 100% Read2 sometimes is unknown (by NCBI Blastn) so it doesn't necessarily match the reference. When mapping with BBmap, does it require both reads to align to the reference???

So if I can map perfect matches to the reference, then I should be able to do the same with host removal, in my test the Human T2T reference matches about 1% to my species. So my thought is to use Tadpole error correction, perhaps with extend, to correct my reads. However, since there is high similarity, is Tadpole going to try and convert Naegleria Fowleri reads to Human, or the Human reads to Naegleria Fowleri???

Maybe you have suggestions?

ADD REPLY

Login before adding your answer.

Traffic: 4343 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6