Question: Download all intronless human genes.
1
gravatar for caggtaagtat
2.3 years ago by
caggtaagtat930
caggtaagtat930 wrote:

Hi,

I would like to download the FASTA files of all genes without an intron.

genome sequence R gene • 774 views
ADD COMMENTlink modified 2.3 years ago by cpad011212k • written 2.3 years ago by caggtaagtat930
5
gravatar for Nicolas Rosewick
2.3 years ago by
Belgium, Brussels
Nicolas Rosewick8.7k wrote:

In R you could do something like this to get all transcript with only one exon (thus intronless)

library(biomaRt)
library(dplyr)

ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")
# will give you one line per exon
gene <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id','hgnc_symbol','ensembl_exon_id'), mart = ensembl)

# extract all transcript with only one exon
gene.oneExon <- 
  gene %>% 
  group_by(ensembl_transcript_id) %>%
  summarise(n=n()) %>%
  filter(n==1)

# get sequence of associated transcript
gene.oneExon.seq <- getSequence(id=gene.oneExon$ensembl_transcript_id,type="ensembl_transcript_id", seqType="gene_exon",mart=ensembl)
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Nicolas Rosewick8.7k

Thank you! I can't connect to server outside the university clinic via R, I think because of my building restrictions. However I will download all human exons manually from Biomart and only select those which have only one transcript!

ADD REPLYlink written 2.3 years ago by caggtaagtat930

Ok so you have to downoad the following attributes : "Transcript stable ID" and "Exon stable ID". Download it and then in R :

gene <- read_tsv(file.txt,col_names=c("ensembl_transcript_id","exon_id"))
gene.oneExon <- 
  gene %>% 
  group_by(ensembl_transcript_id) %>%
  summarise(n=n()) %>%
  filter(n==1)
write.table(gene.oneExon$ensembl_transcript_id,"tid_oneExon.txt",sep="\t",col.names=F,row.names=F,quote=F)

Then in biomart in the Filter tab , you can filter on Transcript Stable ID by giving the tid_onExon.txt file as input in "Input external references ID list". And in the Attributes tab, you can choose cDNA sequences

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Nicolas Rosewick8.7k

Hi, thank you for your fast reply!

I just went to the Biomart website and downloaded all coding sequences, after adjusting the filter, that there should be only one transcript, and the transcript should be protein coding. So I think I now have what I wanted.

However, I didn't quiet understand what the difference is between selecting to download the coding sequence versus downloading the cDNA sequence. Does it make a difference in intronless genes? Thanks again!

ADD REPLYlink written 2.3 years ago by caggtaagtat930
1

cDNA is the actual transcribed sequence. The coding sequence is the sequence that will be translated into protein.

ADD REPLYlink written 2.3 years ago by Nicolas Rosewick8.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 741 users visited in the last hour