Question when converting BAM file to bedgraph file using STAR
0
0
Entering edit mode
6.4 years ago
qudgml0411 • 0

Hi, I'm studying how to analyze RNA-seq data(paired-end).

I aligned raw data(fastq format) to reference using STAR and I got BAM file Aligned.sortedByCoord.out.bam. It looks like this:

MG00HS12:1369:H2J2GBCX2:2:2112:4753:76032       419     1      
3150900 1       101M    =       3150954 155    
GTCTTGGCTACATTCTTTTCTCTCGCCACCTAGCCCCTCTTCTCTTCCAGGTTTCCAAAATGCCTT
TCCAGGCTAGAACCCAGGTTGTGGTCTGCTGGCCA  
DDDD@HHIIIIHHIGHIHHHHIFHIIIIIGHIIHHIIIHIIHIHIIEHIIIIIIIHHHIGHHHIHEHHIGHIHCHHEHHI0EFCDHHIIFHHE1CHHCG1<
NH:i:4  HI:i:2  AS:i:200        NM:i :0  MD:Z:101

MG00HS12:1369:H2J2GBCX2:2:2209:21298:39758      163     1      
3150905 255     100M1S  =       3151181 377    
GGCTACATTCTTTTCTCTCGCCACCTAGCCCCTCTTCTCTTCCAGGTTTCCAAAATGCCTTTCCAG
GCTAGAACCCAGGTTGTGGTCTGCTGGCCGGACAA  
AAABDIIHIHIIIIEHIHICHIIHIHIGIHEHHIIGHHIIIFHHHIHHIHIHHHHHHF?FHHHHEEE?FHCHHHIH@GH@FEECHG??GEHHCEHHCDD##
NH:i:1  HI:i:1  AS:i:189        NM:i :1  MD:Z:95A4

(Sorry about clumsy..)

Now I want to see this in genome browser, I converted this BAM file into bedgraph file using this command:

STAR   --runMode inputAlignmentsFromBAM   --inputBAMfile Aligned.sortedByCoord.out.bam   --outFileNamePrefix ../browser/   --outWigType bedGraph      --outWigStrand Stranded

However, I got one of the bedgraph file Signal.UniqueMultiple.str2.out.bg looks like this:

<chr>   <start>   <end>  <score>
1       3150899 3150904 0.00346
1       3150904 3150934 0.01732
...

The BAM file I showed contain the only one read each start at chr1 3150900 and 3150905. There are no other reads start at these positions. (BAM file is 1-based coordinate and bedgraph file is 0-based coordinate)

Here is my question: How should we calculate the score of bedgraph file from BAM file?

Since 3150904 is between 3150899 and 3151000, the score of the interval between [3150904, 3150934] should contain that of interval between [3150899, 3150904]. Then, why the score of the interval [3150904, 3150934] is not 0.00692(double of 0.00346)? Intervals [3150899, 3151000] and [3150904, 3151005] whose lengths are same be considered differently? even the reads are contained only one in alignment file?

Is there any particular way or considerations when calculating the score of bedgraph from BAM in RNA-seq analysis?

RNA-Seq sequence • 2.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 1282 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6