Question: Output bedGraph from genomeCoverageBed is missing "chr"
0
gravatar for m93
5 months ago by
m93140
m93140 wrote:

I am trying to make a UCSC Genome Browser hub using bigWig files. My starting file format are BAM files. Briefly, my script does as follows:

  • Scale the BAM file according to a specificic pre-determined scaling factor (using samtools).
  • From scaled BAM, make a bedgraph representing coverage (using genomeCoverageBed)
  • From beGgraph file, make bigwig file (using bedGraphToBigwig)

My script looks as follows:

samtools view -s 0.75 -b file.bam > file.scaled.bam;
genomeCoverageBed -bg -split -ibam file.scaled.bam; -g hg38.chrom.sizes > file.bedgraph;
sort -k1,1 -k2,2n file.bedgraph > file.sorted.bedgraph;
bedGraphToBigWig file.sorted.bedgraph hg38.chrom.sizes file.bw

I am having an issue displaying my bigwig file and noticed this is most likely due to my bedGraph file not having a "chr" in front of each chromosome name (so I have "1" instead of "chr1"). I am confused as how this happened.

Initially, when first testing genomeCoverageBed, I kept getting the following warning:

*****WARNING: Genome (-g) files are ignored when BAM input is provided.

I therefore removed the -g argument in the genomeCoverageBed. However, I just noticed (I should have noticed before) that the "chr" is missing in my bedgraph file. Presumably, this is what is causing my bigwig file to then be ill-formatted.

I am confused, am I missing something? Why is genomeCoverageBed not outputting a bedGraph file with "chr" in it, like it should be doing?

Thanks

ADD COMMENTlink modified 5 months ago by RamRS20k • written 5 months ago by m93140
2

Output of samtools view -H file.bam ? Bet your reference genome simply was like 1,2,3 and not chr1,chr2,chr3.

ADD REPLYlink written 5 months ago by ATpoint13k
0
gravatar for jared.andrews07
5 months ago by
jared.andrews071.8k
St. Louis, MO
jared.andrews071.8k wrote:

Are they there in your SAM file? It's possible whatever reference it was aligned to just didn't have them. You can easily add them back via awk:

awk '{print "chr" $0}' file.bedgraph > newfile.bedgraph

Also your second line has an erroneous semicolon.

ADD COMMENTlink modified 5 months ago • written 5 months ago by jared.andrews071.8k

Yeah it seems to be the case. The BAM headers have the following tags:

@SQ SN:1     
@SQ SN:2     
etc

I tried doing this awk command you suggested but can't get it to work in the context of a shell script at the moment. Not sure how to get awk to recognize $0 as the whole line.

ADD REPLYlink modified 5 months ago • written 5 months ago by m93140

What is the error? The command should work fine on the bedGraph.

ADD REPLYlink written 5 months ago by ATpoint13k

I sorted it out I think. I was using this command within a shell script so instead of $0 being the whole line of the bedgraph, it was outputting the name of the script. I used this instead:

 awk -v LINE="$0" '{print "chr" $LINE}' $BEDGRAPH
ADD REPLYlink written 5 months ago by m93140

Don't make simple things complicated ;-D

ADD REPLYlink written 5 months ago by ATpoint13k

What do you mean? The awk command suggested by jared.andrews07 will not work within a script. I'm guessing this is maybe a simpler way than my awk command? I'm not very comfortable changing a BedGraph like this to be honest, I wish there was a way out outputting a bedGraph with what I want directly.

ADD REPLYlink written 5 months ago by m93140
1

There is, add the 'chr' to the headers of your reference genome FASTA file if desired.

ADD REPLYlink written 5 months ago by jared.andrews071.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1771 users visited in the last hour