Extract reads from bam/sam files using read id
2
2
Entering edit mode
6.5 years ago
Chirag Parsania ★ 1.9k

HI,

I have fastq header of 1000s of reads. I want to check presence all of those reads in another bam file. I have written following script 

while IFS='' read -r line || [[ -n "$line" ]]; do samtools view accepted_hits.bam | grep $line ;done < fastqHeaders.txt ;

But this is taking too much time as my .bam file is too big. does anyone know faster way to do this.

Kindly help 

Chirag 

alignment samtools bamfile fastq reads • 8.2k views
ADD COMMENT
6
Entering edit mode
6.5 years ago

use picard FilterSamReads: https://broadinstitute.github.io/picard/command-line-overview.html

READ_LIST_FILE (File) Read List File containing reads that will be included or excluded from the OUTPUT SAM or BAM file. Default value: null.

ADD COMMENT
0
Entering edit mode

After trying to play with grep and a sam file (it worked but was so sloooow), I found your answer and its much faster, thx.

ADD REPLY
0
Entering edit mode

any idea why FilterSamReads would result in

"ERROR: Invalid argument ' '.

code:

> java -jar $EBROOTPICARD/picard.jar FilterSamReads \ I=usritt.bam \ 
> O=output.bam \ READ_LIST_FILE=keep.txt \ FILTER=includeReadList
ADD REPLY
0
Entering edit mode

try to remove the "\" except those at the end of the lines. For instance:

java -jar $EBROOTPICARD/picard.jar FilterSamReads I=usritt.bam O=output.bam READ_LIST_FILE=keep.txt FILTER=includeReadList
ADD REPLY
2
Entering edit mode
6.5 years ago

That's how I would do it. I don't know how fast it is, but for a bam file of few 10s of millions of reads should be acceptable. It depends on python's pysam library. File fastqHeader.txt should have one read name per line.

!#/usr/bin/end python

import pysam

fq= open('fastqHeader.txt').readlines()
fq= [x.strip() for x in fq]
fq= set(fq)

infile= pysam.AlignmentFile('in.bam')
outfile= pysam.AlignmentFile('out.bam', template= infile, mode= 'wb')
for aln in samfile:
    if aln.query_name in fq:
        outfile.write(aln)

infile.close()
outfile.close()
ADD COMMENT
1
Entering edit mode

Thanks, this helped me!

For future users, it should be

for aln in infile: #in place of samfile
ADD REPLY
0
Entering edit mode

Hi dariober! Can you post a sample of what the fastqHeader.txt is supposed to look like? Thank you so much!!

ADD REPLY

Login before adding your answer.

Traffic: 2204 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6