A Quick Way To View Or Check The Order Of Chromosome In Bam File
3
2
Entering edit mode
8.6 years ago
Tonyzeng ▴ 310

HI I have question here, if I just need to check or view the order of chromosome in BAM file after aligment with BWA, what tools can do this and how?

I used samtools tview sample.bam and could not help me find this information of chromosome order easily because " Within the tview of samtools, you can only jump to a new location by typing g: and a location, like g:chr1:10,000,000." from Wikipedia bam samtools • 11k views ADD COMMENT 0 Entering edit mode I cannot check it now, but is samtools idxstats aln.sorted.bam doing what you want? ADD REPLY 9 Entering edit mode 8.6 years ago The first thing to check is that the file is actually coordinate sorted. This can be done with samtools view -H sample.bam | grep SO, which should produce something like @HD VN:1.3 SO:coordinate if the file is in fact sorted. If it is already sorted, then the ordering of the reads and the ordering of the @SQ lines in the header should match. So, simply samtools view -H sample.bam | grep SQ | cut -f 2 | awk '{ sub(/^SN:/, ""); print;}' to quickly get the ordering without needing to go through the entire file. ADD COMMENT 0 Entering edit mode Thank you dpryan, that is so helpful! ADD REPLY 0 Entering edit mode Dpryan, Thank you! Could I ask one more question? what is the best way to check the chromosome order of reference file with .fa format and snp variant file with .vcf format? ADD REPLY 1 Entering edit mode Hmm, there's no super quick equivalent that I know of. For the order of a fasta file, you can either grep ">" reference.fa or, if the fasta file was already indexed with samtools faidx, then just cut -f 1 reference.fa.fai. To get the order of the SNPs, you have to process the whole file unless you've already tabix indexed it (this is part of many SNP calling pipelines). If you've already tabix indexed the file, then I think tabix -l snps.vcf.bgz will work. Otherwise, grep -v "^#" snps.vcf | cut -f 1 | uniq should work. ADD REPLY 0 Entering edit mode Thank you dpryan79, I am still having a question. Now I do know what is the "Chr" order of the header of BAM file but how about the real "Chr" order of the reads of BAM file? I ask this question because I found the my original SAM file (my BAM file was converted by this SAM file) has the different "Chr" order on the header and real reads. So I doubt that my BAM may has different chr order on the header and reads. ADD REPLY 0 Entering edit mode If the SAM file wasn't explicitly sorted, then they could be in any order. SAM and BAM files can either be coordinate-sorted, name-sorted, or unsorted. Generally things are unsorted when they come directly from an aligner (except tophat, which has an optionally disableable sorting step). ADD REPLY 0 Entering edit mode May I ask what the difference between three order ways? ADD REPLY 0 Entering edit mode Two are sorted in various ways, one is just whatever random order you get. ADD REPLY 0 Entering edit mode Be sure to sort before using uniq, or you may get spurious results. ADD REPLY 0 Entering edit mode The premise is that the files are already sorted, I think. If not, your perl solution below would be best in most cases. ADD REPLY 0 Entering edit mode For VCF, using BEDOPS:  vcf2bed < foo.vcf > foo.bed && bedextract --list-chr foo.bed


Documentation: vcf2bed and bedextract.

2
Entering edit mode
8.6 years ago

Duplicate of is my BAM file sorted ?

1
Entering edit mode
8.6 years ago

Assuming that chromosomes aren't interleaved, you could use a hash table lookup to preserve the discovered order, e.g., via Perl:

$samtools view reads.bam | cut -f3 | perl -ne 'print unless$seen{\$_}++' -
chr15
chr7
chr16
chr11
chr12
chr1
chr5
...