Question: Extract reads from bam/sam files using read id
1
gravatar for Chirag Parsania
3.7 years ago by
Chirag Parsania1.4k
University of Macau
Chirag Parsania1.4k wrote:

HI,

I have fastq header of 1000s of reads. I want to check presence all of those reads in another bam file. I have written following script 

while IFS='' read -r line || [[ -n "$line" ]]; do samtools view accepted_hits.bam | grep $line ;done < fastqHeaders.txt ;

But this is taking too much time as my .bam file is too big. does anyone know faster way to do this.

Kindly help 

Chirag 

ADD COMMENTlink modified 3.7 years ago by Pierre Lindenbaum121k • written 3.7 years ago by Chirag Parsania1.4k
5
gravatar for Pierre Lindenbaum
3.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

use picard FilterSamReads : https://broadinstitute.github.io/picard/command-line-overview.html

READ_LIST_FILE (File)    Read List File containing reads that will be included or excluded from the OUTPUT SAM or BAM file. Default value: null.

   
ADD COMMENTlink written 3.7 years ago by Pierre Lindenbaum121k

After trying to play with grep and a sam file (it worked but was so sloooow), I found your answer and its much faster, thx.

ADD REPLYlink written 23 months ago by Carlo Yague4.6k

any idea why FilterSamReads would result in

"ERROR: Invalid argument ' '.

code:

> java -jar $EBROOTPICARD/picard.jar FilterSamReads \ I=usritt.bam \ 
> O=output.bam \ READ_LIST_FILE=keep.txt \ FILTER=includeReadList
ADD REPLYlink written 4 months ago by max_19100

try to remove the "\" except those at the end of the lines. For instance:

java -jar $EBROOTPICARD/picard.jar FilterSamReads I=usritt.bam O=output.bam READ_LIST_FILE=keep.txt FILTER=includeReadList
ADD REPLYlink modified 4 months ago • written 4 months ago by Carlo Yague4.6k
0
gravatar for dariober
3.7 years ago by
dariober10k
WCIP | Glasgow | UK
dariober10k wrote:

That's how I would do it. I don't know how fast it is, but for a bam file of few 10s of millions of reads should be acceptable. It depends on python's pysam library. File fastqHeader.txt should have one read name per line.

!#/usr/bin/end python

import pysam

fq= open('fastqHeader.txt').readlines()
fq= [x.strip() for x in fq]
fq= set(fq)

infile= pysam.AlignmentFile('in.bam')
outfile= pysam.AlignmentFile('out.bam', template= infile, mode= 'wb')
for aln in samfile:
    if aln.query_name in fq:
        outfile.write(aln)

infile.close()
outfile.close()
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by dariober10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1701 users visited in the last hour