I recently started grad-school and have no background working with sequencing data :( As the first thing to do my supervisor asked me to look at these two files "XXX.bam" and "XXX.bam.bai" and figure out if they are the same files (just one indexed) or they are different. I figured out that since these files have exact same name, but one with additional ".bai" at the end, it looks XXX.bam.bai is the index form of the XXX.bam, but I am not completely sure. Can some one please give me a hint how to make sure if the files are the same or not? What program I can use to generate indexed bam? (SAM?! I just heard about it, never had a chance to work with yet) and by what program I can visualize them?
Thank you so much....
moreover, there's no need to compare anything. if you aren't sure if a bai file corresponds to a particular bam file, just delete it and generate a new one as suggested.
The answer is yes. The simplest way is to move the index file into another directory and create a new index (with: samtools index BAM_FILE_NAME). Then you can compare the md5 checksums of the files: md5sum ORIGINAL_FILE md5sum NEW_INDEX_FILE In case the output of both commands is identical, these are the same index files.
Thanks for the quick reply. So you think if I generate a new indexed bam and compare it to the original one, I can figure out if they both have been made from the same .bam file? May I ask what software do you suggest for comparing two indexed bam files?
Thanks again, really appreciate your help.