Question: Output bedGraph from genomeCoverageBed is missing "chr"
1
gravatar for m93
11 months ago by
m93170
m93170 wrote:

I am trying to make a UCSC Genome Browser hub using bigWig files. My starting file format are BAM files. Briefly, my script does as follows:

  • Scale the BAM file according to a specificic pre-determined scaling factor (using samtools).
  • From scaled BAM, make a bedgraph representing coverage (using genomeCoverageBed)
  • From beGgraph file, make bigwig file (using bedGraphToBigwig)

My script looks as follows:

samtools view -s 0.75 -b file.bam > file.scaled.bam;
genomeCoverageBed -bg -split -ibam file.scaled.bam; -g hg38.chrom.sizes > file.bedgraph;
sort -k1,1 -k2,2n file.bedgraph > file.sorted.bedgraph;
bedGraphToBigWig file.sorted.bedgraph hg38.chrom.sizes file.bw

I am having an issue displaying my bigwig file and noticed this is most likely due to my bedGraph file not having a "chr" in front of each chromosome name (so I have "1" instead of "chr1"). I am confused as how this happened.

Initially, when first testing genomeCoverageBed, I kept getting the following warning:

*****WARNING: Genome (-g) files are ignored when BAM input is provided.

I therefore removed the -g argument in the genomeCoverageBed. However, I just noticed (I should have noticed before) that the "chr" is missing in my bedgraph file. Presumably, this is what is causing my bigwig file to then be ill-formatted.

I am confused, am I missing something? Why is genomeCoverageBed not outputting a bedGraph file with "chr" in it, like it should be doing?

Thanks

ADD COMMENTlink modified 11 months ago by RamRS23k • written 11 months ago by m93170
2

Output of samtools view -H file.bam ? Bet your reference genome simply was like 1,2,3 and not chr1,chr2,chr3.

ADD REPLYlink written 11 months ago by ATpoint21k
1
gravatar for jared.andrews07
11 months ago by
jared.andrews072.8k
St. Louis, MO
jared.andrews072.8k wrote:

Are they there in your SAM file? It's possible whatever reference it was aligned to just didn't have them. You can easily add them back via awk:

awk '{print "chr" $0}' file.bedgraph > newfile.bedgraph

Also your second line has an erroneous semicolon.

ADD COMMENTlink modified 11 months ago • written 11 months ago by jared.andrews072.8k

Yeah it seems to be the case. The BAM headers have the following tags:

@SQ SN:1     
@SQ SN:2     
etc

I tried doing this awk command you suggested but can't get it to work in the context of a shell script at the moment. Not sure how to get awk to recognize $0 as the whole line.

ADD REPLYlink modified 11 months ago • written 11 months ago by m93170

What is the error? The command should work fine on the bedGraph.

ADD REPLYlink written 11 months ago by ATpoint21k

I sorted it out I think. I was using this command within a shell script so instead of $0 being the whole line of the bedgraph, it was outputting the name of the script. I used this instead:

 awk -v LINE="$0" '{print "chr" $LINE}' $BEDGRAPH
ADD REPLYlink written 11 months ago by m93170

Don't make simple things complicated ;-D

ADD REPLYlink written 11 months ago by ATpoint21k

What do you mean? The awk command suggested by jared.andrews07 will not work within a script. I'm guessing this is maybe a simpler way than my awk command? I'm not very comfortable changing a BedGraph like this to be honest, I wish there was a way out outputting a bedGraph with what I want directly.

ADD REPLYlink written 11 months ago by m93170
1

There is, add the 'chr' to the headers of your reference genome FASTA file if desired.

ADD REPLYlink written 11 months ago by jared.andrews072.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1181 users visited in the last hour