Samtools View Of A Sam File Entire Chromosome
2
0
Entering edit mode
10.6 years ago
dfernan ▴ 760

Hi,

I have (what I think) is a valid sam file.

When I do:

samtools view -S F1i1F.C57.2.Aligned.out.sam chr19_C57

I get the follong message:

[samopen] SAM header is present: 66 sequences.

[main_samview] random alignment retrieval only works for indexed BAM files.

Note: my chromosomes are chr1_C57, chr2_C57, etc.

As far as I remember I was able to see sam files by chromosomes when using an unsorted sam file but I may be wrong. I can transform it to bam, sort it and index it, but is it necessary?

samtools • 6.8k views
ADD COMMENT
5
Entering edit mode
10.6 years ago

You must index your bam file. Querying works by looking up where the seqid is in the index. You must sort and index.

ADD COMMENT
1
Entering edit mode
10.6 years ago

As far as I remember I was able to see sam files by chromosomes when using an unsorted sam file but I may be wrong. I can transform it to bam, sort it and index it, but is it necessary?

Yes, using only samtools view, you'll need a sorted and indexed BAM file. However, you can also just use grep:

grep -w chr1_C57 F1i1F.C57.2.Aligned.out.sam

To get the reads on that chromosome. That will miss the header, of course. I can conceive of a few edge cases where this wouldn't work (namely, putting the chromosome in an auxiliary tag that also contains spaces), but you could just use a simple awk command if that's the case.

ADD COMMENT
2
Entering edit mode

grep -P "\tchr1_C57\t" F1i1F.C57.2.Aligned.out.sam should work.

egrep '^@|chr1_C57' F1i1F.C57.2.Aligned.out.sam should get you the header too. This will only give meaningful results if the readname doesnt start with "@"

ADD REPLY
0
Entering edit mode

Good call, the regex is definitely a better method than using word match.

ADD REPLY

Login before adding your answer.

Traffic: 2710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6