Hi all,
I have a BAM file which doesn't display when I load it into IGV. From searching, I gather the most common reason for this error is that the chromosome names in the BAM file and genome file don't match. I have checked, and they do match, but there might be an indexing problem. When I view the header for my sorted BAM file, the SQ lines appear in the order scaffold_1
, scaffold_10
, scaffold_11
, scaffold_100
, scaffold_2
, ... (i.e. like Excel would sort these), not scaffold_1
, scaffold_2
, ... scaffold_10
, scaffold_11
, ..., scaffold_100
(i.e. sorted in true numerical order).
My question is: could this sort order then give rise to an indexing problem that explains why IGV can't load the BAM file properly?
I think this might be the case, because a colleague has given me another BAM file (same reference genome), where the scaffolds appear in correct numerical order, and that one displays OK in IGV.
And how do I fix this if so? There don't seem to be too many options for samtools sort. There are no extra characters (leading 00's, etc.) in the scaffold names in either of the BAM files, or the .fai file.
Thanks!
I went to the directory with ref.fa in it and did:
When I view ref.fa.fai the chromosomes appear in numerical order.
I used the same reference genome FASTA file to align the reads to in gsnap. By the alignment index, do you mean the .bai file?
I have no experience with gsnap, but you could possibly do the following: get the order of the chromosomes as they appear in the BAM file (look at the BAM header), then re-arrange the reference FASTA file according to this order, with this re-arranged reference FASTA file try creating the index and .genome file again.
Does IGV complain (any warnings or errors) when loading the file? Or are you inferring it did not load properly because you can't see anything? Maybe you can try the suggestion from I loaded a BAM (RNA-seq) file into IGV but cant see anything! thread.
No complaints or warnings - inferring from the fact I can't see anything. Yes, I saw that thread and tried going to a specific location, but still couldn't see anything. It was that thread that gave me the idea there might be something wrong with the index.
From your previous post, that gave me the idea of trying to rebuild the genome database for gsnap with a different sort order (there are several options, though it's unclear what the default is). That's running now... I'll see if it works.
OK... it seems to be working now. Thanks!