Knowing the number of exons in a range
0
0
Entering edit mode
3.4 years ago
zizigolu ★ 4.3k

Hi

I have copy number segment from exome seq like

> head(CN)
# A tibble: 6 x 6
  file              Chromosome   Start      End Total_CN Minor_CN
  <chr>             <chr>        <int>    <int>    <int>    <int>
1 sample1            51479   817980        2        0

How I know how many exones are in the range of End-Start ?

R exome • 928 views
ADD COMMENT
0
Entering edit mode

If you are looking at intervals the answer is always bedtools.

ADD REPLY
0
Entering edit mode

I need the number of exones in in each range

ADD REPLY
0
Entering edit mode

then bedtools intersect -c?

ADD REPLY
0
Entering edit mode

Search the forum to find to how to query for a list of exons and their genomic location. Once you get that, you can use bedtools and R to get to your answer. You've been on the site long enough to know better than to ask for tailor-made solutions.

ADD REPLY
0
Entering edit mode

Does this code give me human exons cooradinates?

curl  -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz" | gunzip -c |\
 awk '{n=int($8); split($9,S,/,/);split($10,E,/,/); for(i=1;i<=n;++i) {printf("%s,%s,%s,%s,%s\n",$1,$2,$3,S[i],E[i]);} }'
ADD REPLY
0
Entering edit mode

Are you expecting me to run the code and tell you or magically know the content of a file and how the code would alter it and predict that accurately?

You can run a few spot checks, right? Asking us for help is fine, but relying on us to do your job is just irresponsible.

ADD REPLY
0
Entering edit mode

No I alreadyy run the code and obtained the file like

uc001aaa.3  chr1    +   11873   12227
uc001aaa.3  chr1    +   12612   12721
uc001aaa.3  chr1    +   13220   14409
uc010nxr.1  chr1    +   11873   12227
uc010nxr.1  chr1    +   12645   12697
uc010nxr.1  chr1    +   13220   14409
uc010nxq.1  chr1    +   11873   12227
uc010nxq.1  chr1    +   12594   12721
uc010nxq.1  chr1    +   13402   14409

I am asking is this human exon coordinates or no

ADD REPLY
1
Entering edit mode

Please open a genome browser on NCBI/EnsEMBL/UCSC, go to one of the coordinates and find the gene there, look at its exons and see if things match up.

Now, do the same thing for 2-3 genes in different regions. If everything looks OK, your dataset is fine. If not, it's not fine.

The above needs to be done by you, me, anyone if they wish to verify the dataset. Why would you rather we do it than you?

ADD REPLY
0
Entering edit mode

Read the tableSchema to know what you're actually downloading. As for whether the code is actually grabbing what you want, you are capable of verifying that.

As an aside, using exact code you got from elsewhere without understanding what it does is a recipe for mistakes. Always verify your output manually.

ADD REPLY
0
Entering edit mode

Another option is to use the GENCODE GTF file directly. That might be easier to reproduce than the query based approach. Either way, you are perfectly capable of finding these solutions without asking us for help.

ADD REPLY

Login before adding your answer.

Traffic: 2338 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6