Question: download selected intron sequence from UCSC
19 months ago
junhuili0 wrote:

I want to download and select specified No. of introns sequence from UCSC. eg. introns:2,7,8,12,16,19,20 of gene:BRCA1 from FoundationOneCDx.

Here is my steps.

  1. download all introns sequence from UCSC.

    Tools -> Table Browser -> clade:Mammal;gemome:Human;assemble:hg19;grop:Genes and Gene Predictions;track:NCBI RefSeq;table:RefSeq All -> paste list: BRCA1(for example) -> get output -> Sequence Retrieval Region Options:Introns(One FASTA record per region (exon, intron, etc.) with...) -> get sequence:fasta file

  2. select specified introns.

The fasta file includes 130 Introns sequence from different NM on "strand=-", and some introns have the same chromosome coodinates. Here is my thought to get specified introns:

extract all introns chromosome coordinate -> remove duplicated ones -> sort by chromosome coordinate

  The filter result:
  1 range=chr17:41197800-41199679
  2 range=chr17:41199701-41201157
  3 range=chr17:41199701-41203099
  4 range=chr17:41201192-41203099
  5 range=chr17:41203115-41209088
  6 range=chr17:41209133-41215369
  7 range=chr17:41215371-41215910
  8 range=chr17:41215949-41219644
  9 range=chr17:41219693-41222964
 10 range=chr17:41223236-41226367
 11 range=chr17:41226519-41228524
 12 range=chr17:41228609-41231370
 13 range=chr17:41228609-41234440
 14 range=chr17:41228612-41234440
 15 range=chr17:41231397-41234440
 16 range=chr17:41234573-41242980
 17 range=chr17:41243030-41243471
 18 range=chr17:41243030-41246780
 19 range=chr17:41246858-41247882
 20 range=chr17:41247920-41249280
 21 range=chr17:41249287-41251811
 22 range=chr17:41251875-41256158
 23 range=chr17:41251878-41256158
 24 range=chr17:41256259-41256904
 25 range=chr17:41256954-41258492
 26 range=chr17:41256954-41258514
 27 range=chr17:41258531-41267762
 28 range=chr17:41258531-41276053
 29 range=chr17:41267777-41276053
 30 range=chr17:41276113-41277218
 31 range=chr17:41276113-41277307
 32 range=chr17:41276113-41277313

My question is that:

(1) line 2,7,8,12,16,19,20 is the Introns 2,7,8,12,16,19,20 of BRCA1

(2) is there difference between strand=+ and strand=- to get introns sequence

(3) how do deal with overlap chromosome coordinate from above result? eg line 2,3,4

Maybe my thought and step are totally wrong, any help will be appreciated.

sequence gene genome
written 19 months ago

Just get the GTF file for your reference genome and then refer to A: how to get intronic and intergenic sequences based on gff file?

written 19 months ago by ATpoint39k
