Question: Genome Music Bam-List
1
gravatar for will.lockwood
6.4 years ago by
will.lockwood10 wrote:

Hi All,

I apologize in advance if this is a stupid question. I am not very computer literate so this program has been difficult for me to get running.

I am having issues with my bam-list file for the bmr calc-covg command in Genome MuSiC. From the description, it says that that this should be a tab delimited file containing sample names and normal/tumor BAM locations [samplename normalbam tumor_bam] which (I think) I have created. My file has the format as follows:

85040004 /media/sf_OWC_Mercury_Elite_AL_Pro/Input/85040006.bam    /media/sf_OWC_Mercury_Elite_AL_Pro/Input/85040004.bam

However, every time I try to run, I get an error saying that my BAM files cannot be found, as follows:

Normal BAM for 85040004 not found: "/media/sf_OWC_Mercury_Elite_AL_Pro/Input/85040006.bam"
Tumor BAM for 85040004 not found: "/media/sf_OWC_Mercury_Elite_AL_Pro/Input/85040004.bam"

I know that this is the location for the file so I am wondering why it says the BAM is not found. Is it an issue with my BAM list file or the BAM file itself?

On a side note, do my MAF, genome reference file and ROI file all have to have the same formatting for the chromosome column? For example, does my MAF need to have "chr" in front of the number if my ROI file and .fa file do?

Again, I apologize in advance if I am doing something stupid to get this error.

Thanks a lot,

Will

genome music • 2.2k views
ADD COMMENTlink modified 6.4 years ago by Chris Miller20k • written 6.4 years ago by will.lockwood10

Never used, though, you can check few things.

1) Try du -hs on the bam file, just to check if the files are present and aren't the symbolic links.

2) From the manual, it says

--bam-list Provide a file containing sample names and normal/tumor BAM locations for each. Use the tab- delimited format [samplename normalbam tumorbam] per line. Additional columns like clinical data are allowed, but ignored. The samplename must be the same as the tumor sample names used in the MAF file (16th column, with the header TumorSampleBarcode). Are you doing fine with the MAF file?

3) If wanna see file is really bam, try samtools view file.bam | head, if you see something like this, then the file is fine as well.

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Sukhdeep Singh9.6k
0
gravatar for Chris Miller
6.4 years ago by
Chris Miller20k
Washington University in St. Louis, MO
Chris Miller20k wrote:
  • The format of your bam list looks correct
  • This is fairly obvious, but have you actually checked to see if the bam files exist at those locations?
  • I notice that you got errors because of VirtualBox permissions problems on a previous question. Could that be the issue here - your VM doesn't have access to that disk?
  • Yes, the chromosome nomenclature has to match. Either use "chr" or don't, but don't mix the two.
ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Chris Miller20k

Thanks for the help Sukhdeep and Chris.

I have checked and the bam files most definitely exist at those locations. In addition, the VM for sure has access to the disk.

The files themselves are definitely bams. However, one thing I noticed (and this might be trivial) is that when I look at file type of the bam in OSX it says "document", however, while in Ubuntu it says "Gzip archive". Is there are reason these would be different? Are the bams really archives? If so, do they need to be extracted somehow before using in Genome Music?

The only other issues I can think of are that the files are on an external hard drive (dont know why this would be an issue) or that my VM is running out of memory. Not sure if this are issues or if they could be responsible for the errors I am getting.

Sorry to bother you with this. I am really eager to run the program so any help would be greatly appreciated.

Thanks,

Will

ADD REPLYlink written 6.4 years ago by will.lockwood10

What happens if you use a stupid-simple perl script that tries to access the bams? Run this (replacing the path) on your VM: perl -e 'if( -s "/file/path"){print "file found\n"}else{print "file not found\n"}'

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Chris Miller20k

it just spits out "file found".

ADD REPLYlink written 6.4 years ago by will.lockwood10

Copy it to local drive and try again, you will narrow down the problem then

ADD REPLYlink written 6.4 years ago by Sukhdeep Singh9.6k

I copied to a local drive and still have the same issue. Says that the files are not found. Is there a special format that the list file is supposed to be saved as? tab delimited, Unicode etc? This is driving me nuts. If anyone has a template bam list file that I could see it would be greatly appreciated.

Thanks,

Will

ADD REPLYlink written 6.4 years ago by will.lockwood10

This is the relevant line from the code:

print STDERR "Normal BAM for $sample not found: \"$normalbam\"\n" unless( -e $normalbam );

It's spitting back the path correctly, so it's parsing the bam-list just fine. It's not checking the format of the file either, just that it exists.

Does your path contain any symlinks that might not be followed correctly? Do all of the directories have appropriate permissions?

ADD REPLYlink written 6.4 years ago by Chris Miller20k

Bams are archives - essentially a specialized gzip file - but that's not your problem here. The perl scripts can't even see that the file exists.

ADD REPLYlink written 6.4 years ago by Chris Miller20k

Thanks again for your help Chris.

I think the paths are all correct and it is a problem with the format of my list file. I had been making my lists in excel and saving as tab delimited txt files. However, if I make the file directly in the getText application in Ubuntu, with the same paths, it finds the bams and run though everything properly.

I opened the files that were saved from excel in terminal using the "less File.txt" command and noticed that there are a lot of "hidden" characters everywhere. Does anyone know how to get rid of these or bypass this issue?

Thanks again,

Will

ADD REPLYlink written 6.4 years ago by will.lockwood10

Well, the obvious answer is "stop making those lists in excel" :-) Seriously, excel and bioinformatics don't go well together. It often changes gene names, has rounding problems that can skew results, etc. It's just generally bad news.

Perhaps more helpfully, is the problem that you have those extra windows carriage return characters at the end of lines? If so, a quick Google search will turn up lots of options for stripping them out of your files.

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Chris Miller20k

Thanks Chris.

I am pretty lazy and didnt want to spend the time to figure out how to strip them out. So I just opened the files in Google Docs and resaved as a plain text file. All the characters go away. No idea why, but it makes my life easy. Hopefully I can get this thing running now!

ADD REPLYlink written 6.4 years ago by will.lockwood10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1798 users visited in the last hour