Trim bam file reads to remove all nucleotide over a certain length
2
0
Entering edit mode
2.9 years ago
Juke34 8.5k

Hello communauty, the question is in the title: How to Trim bam file reads to remove all nucleotide over a certain length.

For fastq it sounds possible with cutadapt, but I need to do that on a bam file.
On this post Trim bam file reads to the same length they talk about reformat.sh but it just remove read over/under a certain length.
I have tried samtools ampliconclip thinking I could cheat to perform what I want wihtout success...
I have tried Cycle based clipping from GATK but it remove all reads (WellformedReadFilter remove everything...)

Any idea?

bam trimming • 3.4k views
ADD COMMENT
0
Entering edit mode

@Pierre will likely have something either pre-written or ... Did you check his jvarkit (LINK) page?

ADD REPLY
1
Entering edit mode
2.9 years ago

I don't think modifying a bam this way sounds wise.

I'd remake the fastq from the bam, trim that, and realign.

ADD COMMENT
0
Entering edit mode

I perfectly agree that is does not sound wise, but it is exactly what I need to perform. I try to set up an original approach for a specific task.

I think I will try by using samtools view + awk to directly modify the sequences.

ADD REPLY
1
Entering edit mode

Just as a thought: That (trimming sequences) will completely mess up the synchronization between the CIGAR string and the sequences. I could well imagine that downstream tools might refuse to use these "corrupted" files, maybe even samtools as I think it runs some basic validation routines that might choke here, not sure about that, you'll see :)

ADD REPLY
1
Entering edit mode

And of course reads that run reverse will have their fronts chopped off, and this will mean their mapping position is incorrect, and if this is paired-end data, that means the mate data of the mate will have to be fixed as well. And the mapping quality scores will be wrong, since they were calculated with longer reads.

I hope there is no clipping going on.

ADD REPLY
0
Entering edit mode

Thanks for the thoughts. At the step I want to perform this task all my reads are in the same strand (single end). And I do not need the score anymore I have already filter by alignment score much more earlier.

ADD REPLY
1
Entering edit mode

Yes it is what I'm afraid of, it is why I wanted a proper tool to clean the CIGAR string accordingly in case.

ADD REPLY
0
Entering edit mode

It's far easier to document what you are doing if you use established tools to do established things, then to make your own tools and have to work out all the bugs and all the corner cases. There are tested tools to regenerate the fastqs, and tested tools to trim.

ADD REPLY
0
Entering edit mode

I agree it is whay I was looking for a known tool alredy performing this task. I'm surprised that it does not exist.

To be clear what I want to perform: All my reads start at the same position, same strand. I want an even coverage all along the investigated region so I want to trim the ends to shorten the region to a specific position.

----------------
------------
-------------

should become

------------
------------
------------
ADD REPLY
0
Entering edit mode
2.9 years ago
Juke34 8.5k

Using forcetrimright from reformat.sh in the bbmap suit it is possible to trim the reads to a choosen length

ADD COMMENT
0
Entering edit mode

Is this directly on the SAM file, and does it take care of the CIGAR?

ADD REPLY
2
Entering edit mode

Yes it does.

$ samtools view new.bam | head -2 
NS500177:19:H2HLYXXXX:1:11104:17976:3958        97      chr1    39850   13      5=1X4=1X1=1X18=2X1=1X27=1X3=91D1X7=2X   chr22   26122134        0       GAGCCGAGATTGCGCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCTGTCTCAAAAAAAAAAAGGAGGGGGG  <<AAAFFF<FFFFAFF7FFAFAFFFFFFFFFFFFFFFFFF<FF..FAFFFAFFFFFFFFFFFFFFF.F<FFFFFFF    NM:i:10 AM:i:13 XS:A:+
NS500177:19:H2HLYXXXX:1:11103:20522:14693       129     chr1    61792   23      6=72D1=1X22=1X8=1I8=1X27=       chr22   26121634        0       CCCTCCTTTTTTTTTTTGAGACAGAGTCTCGCTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCGGCTCACT  AAAAAFFFFFFFFFF<FF<F.FFFFFFF<FA<<FFFAF7FFFFF.FFFFFFFF7AFFFFFFFFF..)FFFAFF.FF    NM:i:4  AM:i:23 XS:A:-

$ reformat.sh -Xmx5g in=new.bam out=trim.bam forcetrimright=50 ref=genome.fa 

$ samtools view trim.bam | head -2
NS500177:19:H2HLYXXXX:1:11104:17976:3958        97      chr1    39850   13      5=1X4=1X1=1X18=2X1=1X16=        chr22   26122134        0       GAGCCGAGATTGCGCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCT   <<AAAFFF<FFFFAFF7FFAFAFFFFFFFFFFFFFFFFFF<FF..FAFFFA     XS:A:+
NS500177:19:H2HLYXXXX:1:11103:20522:14693       129     chr1    61792   23      6=72D1=1X22=1X8=1I8=1X2=        chr22   26121634        0       CCCTCCTTTTTTTTTTTGAGACAGAGTCTCGCTCTGTCGCCCAGGCTGGAG   AAAAAFFFFFFFFFF<FF<F.FFFFFFF<FA<<FFFAF7FFFFF.FFFFFF     XS:A:-
ADD REPLY

Login before adding your answer.

Traffic: 2406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6