Question: Extract entire reads, rather than aligned segments, from a SAM/BAM file
0
gravatar for gbdias
18 months ago by
gbdias80
gbdias80 wrote:

Hello,

I am trying to extract long reads from the Blasr .sam output. However, I want to collect the entire raw reads that produced alignments, and not just the aligned segments.

Is there an easy way to do this?

Here is the blasr command I am using:

blasr <reads.fasta> <reference.fasta> -noSplitSubreads -clipping none -sam -out output.sam
blasr sam bam • 620 views
ADD COMMENTlink written 18 months ago by gbdias80
1

Grab the read ID's from your SAM file and then pull them out of the original fastq by using filterbyname.sh from BBMap.

ADD REPLYlink modified 18 months ago • written 18 months ago by genomax69k
3
gravatar for cschu181
18 months ago by
cschu1811.7k
cschu1811.7k wrote:

I'd assume that the .bam does not have that information. You'd have to get the read ids and get them from the fastq. Something like the following could work. That's what I used to do to extract paired-end reads (in that case the bam needs to be sorted by name).

grep -A3 -F -f <(samtools view <bamfile> | cut -f 1) <source-fastq> | grep -v "\-\-" > <extracted-reads-fastq>
ADD COMMENTlink written 18 months ago by cschu1811.7k

Thank you. This strategy worked.

ADD REPLYlink modified 18 months ago • written 18 months ago by gbdias80

An alternative I found is to use seqtk. It takes a list of sequence names in a file and extract the sequences from FASTA/Q.

seqtk subseq in.fq name.lst > out.fq
ADD REPLYlink modified 17 months ago • written 17 months ago by gbdias80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 616 users visited in the last hour