I would like to determine the length of all mRNAs. I have downloaded a bed file of exons and split the column in R so that all exons belonging to the same transcript will have the same character in V4.
I have also calculated the length of each exon in V7
exons[c(1:30),] V1 V2 V3 V4 V5 V6 V7 V8 1 chr1 67092175 67093604 NR_075077.1 0 - 1429 _0_0_chr1_67092176_r 2 chr1 67096251 67096321 NR_075077.1 0 - 70 _1_0_chr1_67096252_r 3 chr1 67103237 67103382 NR_075077.1 0 - 145 _2_0_chr1_67103238_r 4 chr1 67111576 67111644 NR_075077.1 0 - 68 _3_0_chr1_67111577_r 5 chr1 67113613 67113756 NR_075077.1 0 - 143 _4_0_chr1_67113614_r 6 chr1 67115351 67115464 NR_075077.1 0 - 113 _5_0_chr1_67115352_r 7 chr1 67125751 67125909 NR_075077.1 0 - 158 _6_0_chr1_67125752_r 8 chr1 67127165 67127257 NR_075077.1 0 - 92 _7_0_chr1_67127166_r 9 chr1 67131141 67131227 NR_075077.1 0 - 86 _8_0_chr1_67131142_r 10 chr1 67134929 67134971 NR_075077.1 0 - 42 _9_0_chr1_67134930_r 11 chr1 67092175 67093604 NM_001276352.1 0 - 1429 _0_0_chr1_67092176_r 12 chr1 67096251 67096321 NM_001276352.1 0 - 70 _1_0_chr1_67096252_r 13 chr1 67103237 67103382 NM_001276352.1 0 - 145 _2_0_chr1_67103238_r 14 chr1 67111576 67111644 NM_001276352.1 0 - 68 _3_0_chr1_67111577_r 15 chr1 67115351 67115464 NM_001276352.1 0 - 113 _4_0_chr1_67115352_r 16 chr1 67125751 67125909 NM_001276352.1 0 - 158 _5_0_chr1_67125752_r 17 chr1 67127165 67127257 NM_001276352.1 0 - 92 _6_0_chr1_67127166_r 18 chr1 67131141 67131227 NM_001276352.1 0 - 86 _7_0_chr1_67131142_r 19 chr1 67134929 67134971 NM_001276352.1 0 - 42 _8_0_chr1_67134930_r 20 chr1 67092175 67093604 NM_001276351.1 0 - 1429 _0_0_chr1_67092176_r 21 chr1 67095234 67095421 NM_001276351.1 0 - 187 _1_0_chr1_67095235_r 22 chr1 67096251 67096321 NM_001276351.1 0 - 70 _2_0_chr1_67096252_r 23 chr1 67115351 67115464 NM_001276351.1 0 - 113 _3_0_chr1_67115352_r 24 chr1 67125751 67125909 NM_001276351.1 0 - 158 _4_0_chr1_67125752_r 25 chr1 67127165 67127257 NM_001276351.1 0 - 92 _5_0_chr1_67127166_r 26 chr1 67131141 67131227 NM_001276351.1 0 - 86 _6_0_chr1_67131142_r 27 chr1 67134929 67134971 NM_001276351.1 0 - 42 _7_0_chr1_67134930_r 28 chr1 67092175 67093604 XM_011541469.1 0 - 1429 _0_0_chr1_67092176_r 29 chr1 67095234 67095421 XM_011541469.1 0 - 187 _1_0_chr1_67095235_r 30 chr1 67096251 67096321 XM_011541469.1 0 - 70 _2_0_chr1_67096252_r
What I want to do is sum all of the values of V7 where V4 values are equal
So where V4 is equal, like it is below, add all the values in V7 to get 2,346
V4 V5 V6 V7 V8
NR_075077.1 0 - 1429 _0_0_chr1_67092176_r
NR_075077.1 0 - 70 _1_0_chr1_67096252_r
NR_075077.1 0 - 145 _2_0_chr1_67103238_r
NR_075077.1 0 - 68 _3_0_chr1_67111577_r
NR_075077.1 0 - 143 _4_0_chr1_67113614_r
NR_075077.1 0 - 113 _5_0_chr1_67115352_r
NR_075077.1 0 - 158 _6_0_chr1_67125752_r
NR_075077.1 0 - 92 _7_0_chr1_67127166_r
NR_075077.1 0 - 86 _8_0_chr1_67131142_r
NR_075077.1 0 - 42 _9_0_chr1_67134930_r
This is completely unreadable - could you either make a (much) smaller example or put the data in a google sheet?
I have put a hypothetical example below