Filtration Of Reads With Length Lower Than 30 From Bam
2
5
Entering edit mode
8.4 years ago
filipzembol ▴ 150

Dear all, I have one question how could I filtrate the reads from bam file, which have length of read lower than 30 bp. If it is lower than 30 bp, this rows will be deleted from bam file. I think I could use :

samtools view -h /home/filip/Desktop/rozdeleny\ bed_009_QCfailed/Ionfiltrovany1.bam | perl -lane '$l = 10; $F[5] =~ s/(\d+)[MX=DN]/$l+=$1/eg; print if $l > 30 or /^@/' | samtools view -bS - > bar.bam

Tahnk you

read length bam • 13k views
ADD COMMENT
1
Entering edit mode

why looking at the cigar string $F[5] when you can just get the length of the SEQ ( $F[9] ) ? Is it really the LENGTH of the READ you're looking for ?

ADD REPLY
0
Entering edit mode

Oh I think it is my mistake. I think at 10th column is the all read (ACTCG...) and Could I use this syntax for the 9th column to filtrate the reads with length lower than 30 ?

ADD REPLY
0
Entering edit mode

yes, but again, why do you want to use the 9th column=CIGAR instead of the 10th=ATGC sequence ? I would understand if you only wanted the number of reference bases that the read covers, excluding padding.

ADD REPLY
0
Entering edit mode

No I would like to filter the reads which have the read length lower than 30 bp. In final bam file will be only the reads, which have length higher than 30bp. Or I badly understand of your question? I don't know if my script is ok. I am starting bioinformatician...

ADD REPLY
9
Entering edit mode
8.4 years ago
samtools view -h /home/filip/Desktop/rozdeleny\ bed_009_QCfailed/Ionfiltrovany1.bam | awk 'length($10) > 30 || $1 ~ /^@/' | samtools view -bS - > bar.bam

will give you bar.bam file with reads with length greater than 30 bp. Modified on 07/22 after dariober pointed out a bug.

ADD COMMENT
4
Entering edit mode

Shouldn't it be ... | awk 'length($10) > 30 || $1 ~ /^@/' | ... ?

ADD REPLY
0
Entering edit mode

Yup it was my bad. I should have double checked or tested it before submitting it. Thanks for pointing it out. I will modify it.

ADD REPLY
0
Entering edit mode

How would this be adjusted to give a BAM file with a range say 1-80bp for the insert size?

ADD REPLY
1
Entering edit mode
samtools view -h /home/filip/Desktop/rozdeleny\ bed_009_QCfailed/Ionfiltrovany1.bam | awk 'length($10) > 0 && length($10) < 80 || $1 ~ /^@/' | samtools view -bS - > bar.bam
ADD REPLY
2
Entering edit mode
9 weeks ago

2022 (8 years later)

samtools view -e 'length(seq)>30'  -O BAM -o out.bam in.bam
ADD COMMENT
0
Entering edit mode

Just wondering (answer is most likely yes because it's samtools), but does this take full care of CIGAR trimming etc?

By the way, for further read of filtering expressions in samtools: http://www.htslib.org/doc/samtools.html#FILTER_EXPRESSIONS

Users should also be aware that this is a releatively recent addition. Afaik it's version 1.14 that -e was added so be sure your version is recent.

ADD REPLY
1
Entering edit mode

just wondering (answer is most likely yes because it's samtools), but does this take full care of CIGAR trimming etc?

no, I think there are some other fields:

qlen    int Alignment length: no. query bases
rlen    int Alignment length: no. reference bases
ADD REPLY
0
Entering edit mode

But note also that the docs say:

qlen" and "rlen" are measured using the CIGAR string to count the number of query (sequence) and reference bases consumed. Note "qlen" may not exactly match the length of the "seq" field if the sequence is "*".


Not that it really matters, -e filtering should have been introduced in v1.12 - but I found out about it only now thanks to this post!

ADD REPLY

Login before adding your answer.

Traffic: 686 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6