Get unmapped reads from cram/bam file
1
0
Entering edit mode
3.1 years ago
Palgrave ▴ 110

I am extracting unmapped reads from a cram file using the command:

   samtools  view -b -f 4 in.cram > out.cram

However, I wonder if there is a faster way of doing this. Either another program or using more threads.

alignment • 3.7k views
ADD COMMENT
0
Entering edit mode
   samtools  view --threads 10 --reference /path/to/reference.fa -O CRAM -f 4 in.cram > out.cram
ADD REPLY
1
Entering edit mode
3.1 years ago
ATpoint 82k

-@ parameter for more threads, beyond that probably no, samtools is top of the pops for these kinds of things. Be sure that the reference file you used for CRAM generation is available or explicitely specify it with -T as suggested by Pierre Lindenbaum

CRAM files rely on the reference file to decode the sequences.

ADD COMMENT
0
Entering edit mode

Be sure that the reference file you used for CRAM generation is available

well samtools is able to find the reference if it can find it in the SAM header or if some env variables are set. ${REF_PATH} ....

ADD REPLY
1
Entering edit mode

Yes, that is what I mean. The file must be available on the machine/network you work on/in. If it is then samtools may even find it implicitely.

ADD REPLY
0
Entering edit mode

But I don't need the reference file to extract the unmapped reads using my command?

ADD REPLY
0
Entering edit mode

Don't you? I mean, technically (from my understanding) the unmapped reads are not compressed against the reference, but doesn't samtools throw an error if there is no reference for a CRAM that contains at least some mapped reads? Not sure, maybe you're right and you do not need it at all. As said samtools can implicitely search for the reference on your disk so maybe it uses it without that you note it.

ADD REPLY
0
Entering edit mode

I have a crai file for each cram file is this enough, or do I need the references as well? I did not get an error, but its running for several hours for a 100GB file and so far the unmapped reads are about 800MB.

ADD REPLY
0
Entering edit mode

do I need the reference when I have the crai-file? It seems to run without references with my original command

ADD REPLY
0
Entering edit mode

do I need the reference when I have the crai-file? It seems to run without references with my original command

yes, unless samtools is able to find the REF by itself.

ADD REPLY
0
Entering edit mode

Why is it then running without specifying it? What will the output be?

ADD REPLY
0
Entering edit mode

why don't you just try ?

ADD REPLY
0
Entering edit mode

its been running for 24h using a 100GB cram as input. Not sure if its supposed to take that long. Now it has created a 1GB unmapped file. But don't finished.

ADD REPLY
0
Entering edit mode

instead of redirecting it to a CRAM file, just keep the SAM format and pipe the output in more

ADD REPLY

Login before adding your answer.

Traffic: 1986 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6