I want to download and select specified No. of introns sequence from UCSC. eg. introns:2,7,8,12,16,19,20 of gene:BRCA1 from FoundationOneCDx.
Here is my steps.
download all introns sequence from UCSC.
Tools -> Table Browser -> clade:Mammal;gemome:Human;assemble:hg19;grop:Genes and Gene Predictions;track:NCBI RefSeq;table:RefSeq All -> paste list: BRCA1(for example) -> get output -> Sequence Retrieval Region Options:Introns(One FASTA record per region (exon, intron, etc.) with...) -> get sequence:fasta file
select specified introns.
The fasta file includes 130 Introns sequence from different NM on "strand=-", and some introns have the same chromosome coodinates. Here is my thought to get specified introns:
extract all introns chromosome coordinate -> remove duplicated ones -> sort by chromosome coordinate
The filter result:
1 range=chr17:41197800-41199679
2 range=chr17:41199701-41201157
3 range=chr17:41199701-41203099
4 range=chr17:41201192-41203099
5 range=chr17:41203115-41209088
6 range=chr17:41209133-41215369
7 range=chr17:41215371-41215910
8 range=chr17:41215949-41219644
9 range=chr17:41219693-41222964
10 range=chr17:41223236-41226367
11 range=chr17:41226519-41228524
12 range=chr17:41228609-41231370
13 range=chr17:41228609-41234440
14 range=chr17:41228612-41234440
15 range=chr17:41231397-41234440
16 range=chr17:41234573-41242980
17 range=chr17:41243030-41243471
18 range=chr17:41243030-41246780
19 range=chr17:41246858-41247882
20 range=chr17:41247920-41249280
21 range=chr17:41249287-41251811
22 range=chr17:41251875-41256158
23 range=chr17:41251878-41256158
24 range=chr17:41256259-41256904
25 range=chr17:41256954-41258492
26 range=chr17:41256954-41258514
27 range=chr17:41258531-41267762
28 range=chr17:41258531-41276053
29 range=chr17:41267777-41276053
30 range=chr17:41276113-41277218
31 range=chr17:41276113-41277307
32 range=chr17:41276113-41277313
My question is that:
(1) line 2,7,8,12,16,19,20 is the Introns 2,7,8,12,16,19,20 of BRCA1
(2) is there difference between strand=+ and strand=- to get introns sequence
(3) how do deal with overlap chromosome coordinate from above result? eg line 2,3,4
Maybe my thought and step are totally wrong, any help will be appreciated.
Just get the GTF file for your reference genome and then refer to A: how to get intronic and intergenic sequences based on gff file?