Hi, I'm studying how to analyze RNA-seq data(paired-end).
I aligned raw data(fastq format) to reference using STAR and I got BAM file Aligned.sortedByCoord.out.bam. It looks like this:
MG00HS12:1369:H2J2GBCX2:2:2112:4753:76032 419 1
3150900 1 101M = 3150954 155
GTCTTGGCTACATTCTTTTCTCTCGCCACCTAGCCCCTCTTCTCTTCCAGGTTTCCAAAATGCCTT
TCCAGGCTAGAACCCAGGTTGTGGTCTGCTGGCCA
DDDD@HHIIIIHHIGHIHHHHIFHIIIIIGHIIHHIIIHIIHIHIIEHIIIIIIIHHHIGHHHIHEHHIGHIHCHHEHHI0EFCDHHIIFHHE1CHHCG1<
NH:i:4 HI:i:2 AS:i:200 NM:i :0 MD:Z:101
MG00HS12:1369:H2J2GBCX2:2:2209:21298:39758 163 1
3150905 255 100M1S = 3151181 377
GGCTACATTCTTTTCTCTCGCCACCTAGCCCCTCTTCTCTTCCAGGTTTCCAAAATGCCTTTCCAG
GCTAGAACCCAGGTTGTGGTCTGCTGGCCGGACAA
AAABDIIHIHIIIIEHIHICHIIHIHIGIHEHHIIGHHIIIFHHHIHHIHIHHHHHHF?FHHHHEEE?FHCHHHIH@GH@FEECHG??GEHHCEHHCDD##
NH:i:1 HI:i:1 AS:i:189 NM:i :1 MD:Z:95A4
(Sorry about clumsy..)
Now I want to see this in genome browser, I converted this BAM file into bedgraph file using this command:
STAR --runMode inputAlignmentsFromBAM --inputBAMfile Aligned.sortedByCoord.out.bam --outFileNamePrefix ../browser/ --outWigType bedGraph --outWigStrand Stranded
However, I got one of the bedgraph file Signal.UniqueMultiple.str2.out.bg looks like this:
<chr> <start> <end> <score>
1 3150899 3150904 0.00346
1 3150904 3150934 0.01732
...
The BAM file I showed contain the only one read each start at chr1 3150900 and 3150905. There are no other reads start at these positions. (BAM file is 1-based coordinate and bedgraph file is 0-based coordinate)
Here is my question: How should we calculate the score of bedgraph file from BAM file?
Since 3150904 is between 3150899 and 3151000, the score of the interval between [3150904, 3150934] should contain that of interval between [3150899, 3150904]. Then, why the score of the interval [3150904, 3150934] is not 0.00692(double of 0.00346)? Intervals [3150899, 3151000] and [3150904, 3151005] whose lengths are same be considered differently? even the reads are contained only one in alignment file?
Is there any particular way or considerations when calculating the score of bedgraph from BAM in RNA-seq analysis?