8.9 years ago
dfernan ▴ 710

Hi,

I have (what I think) is a valid sam file.

When I do:

samtools view -S F1i1F.C57.2.Aligned.out.sam chr19_C57


I get the follong message:

[samopen] SAM header is present: 66 sequences.

[main_samview] random alignment retrieval only works for indexed BAM files.

Note: my chromosomes are chr1_C57, chr2_C57, etc.

As far as I remember I was able to see sam files by chromosomes when using an unsorted sam file but I may be wrong. I can transform it to bam, sort it and index it, but is it necessary?

8.9 years ago

You must index your bam file. Querying works by looking up where the seqid is in the index. You must sort and index.

8.9 years ago

As far as I remember I was able to see sam files by chromosomes when using an unsorted sam file but I may be wrong. I can transform it to bam, sort it and index it, but is it necessary?

Yes, using only samtools view, you'll need a sorted and indexed BAM file. However, you can also just use grep:

grep -w chr1_C57 F1i1F.C57.2.Aligned.out.sam


To get the reads on that chromosome. That will miss the header, of course. I can conceive of a few edge cases where this wouldn't work (namely, putting the chromosome in an auxiliary tag that also contains spaces), but you could just use a simple awk command if that's the case.

grep -P "\tchr1_C57\t" F1i1F.C57.2.Aligned.out.sam should work.

Good call, the regex is definitely a better method than using word match.

