Question: Extract entire reads, rather than aligned segments, from a SAM/BAM file
0
gravatar for gbdias
2.8 years ago by
gbdias90
gbdias90 wrote:

Hello,

I am trying to extract long reads from the Blasr .sam output. However, I want to collect the entire raw reads that produced alignments, and not just the aligned segments.

Is there an easy way to do this?

Here is the blasr command I am using:

blasr <reads.fasta> <reference.fasta> -noSplitSubreads -clipping none -sam -out output.sam
blasr sam bam • 921 views
ADD COMMENTlink written 2.8 years ago by gbdias90
1

Grab the read ID's from your SAM file and then pull them out of the original fastq by using filterbyname.sh from BBMap.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by genomax92k
3
gravatar for cschu181
2.8 years ago by
cschu1812.5k
cschu1812.5k wrote:

I'd assume that the .bam does not have that information. You'd have to get the read ids and get them from the fastq. Something like the following could work. That's what I used to do to extract paired-end reads (in that case the bam needs to be sorted by name).

grep -A3 -F -f <(samtools view <bamfile> | cut -f 1) <source-fastq> | grep -v "\-\-" > <extracted-reads-fastq>
ADD COMMENTlink written 2.8 years ago by cschu1812.5k

Thank you. This strategy worked.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by gbdias90

An alternative I found is to use seqtk. It takes a list of sequence names in a file and extract the sequences from FASTA/Q.

seqtk subseq in.fq name.lst > out.fq
ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by gbdias90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1210 users visited in the last hour