Question: How To Find Intron Sequences From Ncbi Records?
2
gravatar for hicsuntdrac0nis
7.0 years ago by
hicsuntdrac0nis220 wrote:

How can I find the the intron sequences from the databases available through NCBI?

At one point I found a database (FlyBase) for drosophila intron sequences but it doesn't seem to be available anymore.

I want them for humans and mouse if that is possible

ncbi sequence intron • 14k views
ADD COMMENTlink modified 7.0 years ago by deanna.church1.1k • written 7.0 years ago by hicsuntdrac0nis220

what kind of record ? a genebank record ? a gene record ? or do you just want a place to download the sequence of those introns (which organism ?)

ADD REPLYlink written 7.0 years ago by Pierre Lindenbaum122k
1
gravatar for Istvan Albert
7.0 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

Well, you did post a very similar question in the past Database to find intron sequences? and you got a good number of answers for it. I would recommend reading through those and following the links that are specified there.

The short answer to your question is that you cannot directly download intronic sequences from NCBI. What you can do is post process the files that you get from NCBI and extract that information. If you want to know how to do that you should open a new specific question along the way: How to extract introns from a genbank file?

ADD COMMENTlink written 7.0 years ago by Istvan Albert ♦♦ 81k

ya the other question helped but a lot of the links for software were not working or not compatible with the newer mac series that do not run on PowerPC

ADD REPLYlink written 7.0 years ago by hicsuntdrac0nis220
1
gravatar for Rm
7.0 years ago by
Rm7.9k
Danville, PA
Rm7.9k wrote:
  1. Generate intron bed as described here from the UCSC. (you can use refseq introns if required)
  2. Download the reference genome.
  3. Use nucBed from bedtools to get the fasta sequences:

    nucBed -s -fi Homo_sapiens.GRCh37.62.fa -bed hg19.introns.bed -seq | awk '(NR>1){print ">"$4 "| "$1":"$2"-"$3"\n" $16}' > hg19.introns.fasta

Output:

>uc001aaa.3_intron_0_0_1_12228_f| 1:12227-12612
GTAAGTAGTGCTTGTGCTCATCTCCTTGGCTGTGATACGTGGCCGGCCCTCGCTCCAGCAGCTGGACCCCTACCTGCCGTCTGCTGCCATCGGAGCCCAAAGCCGGGCTGTGACTGCTCAGACCAGCCGGCTGGAGGGAGGGGCTCAGCAGGTCTGGCTTTGGCCCTGGGAGAGCAGGTGGAAGATCAGGCAGGCCATCGCTGCCA
CAGAACCCAGTGGATTGGCCTAGGTGGGATCTCTGAGCTCAACAAGCCCTCTCTGGGTGGTAGGTGCAGAGACGGGAGGGGCAGAGCCGCAGGCACAGCCAAGAGGGCTGAAGAAATGGTAGAACGGAGCAGCTGGTGATGTGTGGGCCCACCGGCCCCAGGCTCCTGTCTCCCCCCAG
>uc001aaa.3_intron_1_0_1_12722_f| 1:12721-13220
GTGAGAGGAGAGTAGACAGTGAGTGGGAGTGGCGTCGCCCCTAGGGCTCTACGGGGCCGGCGTCTCCTGTCTCCTGGAGAGGCTTCGATGCCCCTCCACACCCTCTTGATCTTCCCTGTGATGTCATCTGGAGCCCTGCTGCTTGCGGTGGCCTATAAAGCCTCCTAGTCTGGCTCCAAGGCCTGGCAGAGTCTTTCCCAGGGAAA
GCTACAAGCAGCAAACAGTCTGCATGGGTCATCCCCTTCACTCCCAGCTCAGAGCCCAGGCCAGGGGCCCCCAAGAAAGGCTCTGGTGGAGAACCTGTGCATGAAGGCTGTCAACCAGTCCATAGGCAAGCCTGGCTGCCTCCAGCTGGGTCGACAGACAGGGGCTGGAGAAGGGGAGAAGAGGAAAGTGAGGTTGCCTGCCCTGT
CTCCTACCTGAGGCTGAGGAAGGAGAAGGGGATGCACTGTTGGGGAGGCAGCTGTAACTCAAAGCCTTAGCCTCTGTTCCCACGAAG
>uc010nxr.1_intron_0_0_1_12228_f| 1:12227-12645
GTAAGTAGTGCTTGTGCTCATCTCCTTGGCTGTGATACGTGGCCGGCCCTCGCTCCAGCAGCTGGACCCCTACCTGCCGTCTGCTGCCATCGGAGCCCAAAGCCGGGCTGTGACTGCTCAGACCAGCCGGCTGGAGGGAGGGGCTCAGCAGGTCTGGCTTTGGCCCTGGGAGAGCAGGTGGAAGATCAGGCAGGCCATCGCTGCCA
CAGAACCCAGTGGATTGGCCTAGGTGGGATCTCTGAGCTCAACAAGCCCTCTCTGGGTGGTAGGTGCAGAGACGGGAGGGGCAGAGCCGCAGGCACAGCCAAGAGGGCTGAAGAAATGGTAGAACGGAGCAGCTGGTGATGTGTGGGCCCACCGGCCCCAGGCTCCTGTCTCCCCCCAGGTGTGTGGTGATGCCAGGCATGCCCTT
CCCCAG
>uc010nxr.1_intron_1_0_1_12698_f| 1:12697-13220
GTGAGTGTCCCCAGTGTTGCAGAGGTGAGAGGAGAGTAGACAGTGAGTGGGAGTGGCGTCGCCCCTAGGGCTCTACGGGGCCGGCGTCTCCTGTCTCCTGGAGAGGCTTCGATGCCCCTCCACACCCTCTTGATCTTCCCTGTGATGTCATCTGGAGCCCTGCTGCTTGCGGTGGCCTATAAAGCCTCCTAGTCTGGCTCCAAGGC
CTGGCAGAGTCTTTCCCAGGGAAAGCTACAAGCAGCAAACAGTCTGCATGGGTCATCCCCTTCACTCCCAGCTCAGAGCCCAGGCCAGGGGCCCCCAAGAAAGGCTCTGGTGGAGAACCTGTGCATGAAGGCTGTCAACCAGTCCATAGGCAAGCCTGGCTGCCTCCAGCTGGGTCGACAGACAGGGGCTGGAGAAGGGGAGAAGA
GGAAAGTGAGGTTGCCTGCCCTGTCTCCTACCTGAGGCTGAGGAAGGAGAAGGGGATGCACTGTTGGGGAGGCAGCTGTAACTCAAAGCCTTAGCCTCTGTTCCCACGAAG
(...)
ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by Rm7.9k
1
gravatar for deanna.church
7.0 years ago by
deanna.church1.1k
Bethesda, MD
deanna.church1.1k wrote:

GFF3 files are now available for the latest NCBI annotation. Here is the link to the human file: ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/

and the mouse file: ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/GFF/

ADD COMMENTlink written 7.0 years ago by deanna.church1.1k
0
gravatar for yuyin110110110
7.0 years ago by
yuyin1101101100 wrote:

If you know the gene name or accession, the UCSC enter link description here can help you, beside the "identifiers" label, open the "paste list ", fill you gene name or accession, and click the " get output" button, select the relate button, you can get an satisfying result.

ADD COMMENTlink written 7.0 years ago by yuyin1101101100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1842 users visited in the last hour