The ENCODE data comes out, and luckily they provide both
.bam file and
.bigwig file. Thus, it occurs to me that I want to give a try to reproduce the data visualization with tool:
BEDtools and other related tools.
I'll first upload the difference between my-version and official version:
Top to Bottom:
- Black: my-version-POSitive-strand.bigwig
- Blue: Official-version-POSitive-strand.bigwig
- Red: Official-version-REVerse-strand.bigwig
- Grey: my-version-REVerse-strand.bigwig
From the image, we will find my-version-data and official-version-data roughly share the same peaks, however, my-version-peaks are somehow masked by certain uniform noises. And it drives me crazy.
Note that I know not all the bioinformatics works can be reproduces, but this issue dose not get involved with much algorithms, decisions, etc. Therefore, it's supposed to be reproducible, I think.
ENCODE/CSHL long RNA-seq Data set can be found here: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeCshlLongRnaSeq/ And here I use K562-chromatin-subcellular fraction (Rep4) to explore as an example:
- BAM file ready for my Data Processing;
- bigWig Positive signal file ready for uploading to UCSC;
- bigwig Reverse signal file ready for uploading to UCSC.
samtools sort wgEncodeCshlLongRnaSeq/wgEncodeCshlLongRnaSeqK562ChromatinTotalAlnRep4.bam wgEncodeCshlLongRnaSeq/wgEncodeCshlLongRnaSeqK562ChromatinTotalAlnRep4.bam.sort
I refer to the standard manual of
BEDtools, I'll use forward strand as example, and the reverse strand signal is generated in the same way.
genomeCoverageBed -bg -ibam wgEncodeCshlLongRnaSeq/wgEncodeCshlLongRnaSeqK562ChromatinTotalAlnRep4.bam.sort -g hg19.chromInfo -strand + >K562-Chromatin-POS-4.bedgraph
Note that I've used
-strand flag to separate the two strands.
bedGraphToBigWig executive script available from UCSC exe list
bedGraphToBigWig K562-Chromatin-POS-4.bedgraph hg19.chromInfo K562-Chromatin-POS-4.bigwig
Upload to ftp and finally to UCSC genome browser.
I was wondering which filtering step I've missed.
I've checked whether all the reads in the
.bam file are unique mapped. As the reads are mapped to genome with a tool named, STAR.. According to the manual and common sense, the mapping quality in .sam file equaling 255 means unique mapped reads. Thus, all the reads in the .bam file are unique mapped after I've check the mapping quality.
Thus, any suggestions?