Aggregating coverage values exon wise
1
0
Entering edit mode
4.4 years ago
Gene ▴ 20

I have exon ranges, for example:

Chr_exon start - exon end

1_3774167-1_3774281

1_3775281-1_3775430

1_3782281-1_3782564

1_3784537-1_3784617

1_3786168-1_3786339

1_3789035-1_3789136

1_3800070-1_3800305

And also, I have, Coverage values for all positions,for example,

chr_position coverage value

1_3774167 175.6

1_3774168 175.6

1_3774169 175.6

1_3774170 175.6

1_3774171 175.6

1_3774172 175.6

1_3774173 175.6

1_3774174 175.6

Is it possible to aggregate them exon wise? For example, query by exon range (start-end), and find mean for all values in that exon/ range using unix functions or maybe python ? Thanks a lot in advance. I am a beginner, and sorry if question is stupid.

ngs coverage aggregation Exon • 792 views
ADD COMMENT
1
Entering edit mode

Since I am a bioinformatician from 90s, I would write a script in R that would do it for me! It could be 20 lines of code. But I am sure there are tools for that.

ADD REPLY
1
Entering edit mode

Thank you for your response. I am not that familiar with R, but will try to code it then in Python.

ADD REPLY
0
Entering edit mode
4.4 years ago

For starting, you may want to get these into standardised formats:

cat file1.txt 
1_3774167-1_3774281
1_3775281-1_3775430
1_3782281-1_3782564
1_3784537-1_3784617
1_3786168-1_3786339
1_3789035-1_3789136
1_3800070-1_3800305

sed 's/[_\\-]/\t/g' file1.txt | cut -f1,2,4 > file1.bed

cat file1.bed 
1   3774167 3774281
1   3775281 3775430
1   3782281 3782564
1   3784537 3784617
1   3786168 3786339
1   3789035 3789136
1   3800070 3800305


cat file2.txt 
1_3774167 175.6
1_3774168 175.6
1_3774169 175.6
1_3774170 175.6
1_3774171 175.6
1_3774172 175.6
1_3774173 175.6
1_3774174 175.6

sed 's/[_\\ ]/\t/g' file2.txt | awk '{print $1"\t"$2"\t"$2"\t"$3}' > file2.bed

cat file2.bed
1   3774167 3774167 175.6
1   3774168 3774168 175.6
1   3774169 3774169 175.6
1   3774170 3774170 175.6
1   3774171 3774171 175.6
1   3774172 3774172 175.6
1   3774173 3774173 175.6
1   3774174 3774174 175.6

After that, you may find a solution via BEDTools or via GenomicRanges in R. You will have to se rules about ho to summarise coverage per exon.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 1653 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6