Accesing intron sequences using the GenomicFeatures package (R)?
1
0
Entering edit mode
3.7 years ago
carroll ▴ 30

Hi everyone,

I'd like to extract intronic sequences for my study species using R. For that, I was planning to use the GenomicFeatures package. I got to the point when I have a GRanges object that I produced using intronicParts() which, I suppose, contains intron ranges. How should I proceed to get intron sequences in a fasta format? I did not find an answer in the manual: https://bioconductor.org/packages/release/bioc/manuals/GenomicFeatures/man/GenomicFeatures.pdf

Thank you in advance!

R • 1.4k views
ADD COMMENT
2
Entering edit mode
3.7 years ago
caggtaagtat ★ 1.9k

Hi, you could check out the getSeq function from BSgenome. It takes a GRanges object and a genome as an input and gives you the sequences. You can then save them as fasta using the write.fasta function of seqinr.

ADD COMMENT
1
Entering edit mode

getSeq will return a Biostrings object, which you can save as a fasta using Biostrings::writeXStringSet, if you want to skip using seqinr.

ADD REPLY
0
Entering edit mode

Hi! I tried using getSeq but got the following error:

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘getSeq’ for signature ‘"DNAStringSet"’

Isn't this weird given it requires an XStringSet object?

ADD REPLY
1
Entering edit mode

I don't know what to do about the error.

As a workaround, you can also do something like this:

#For every intron
for(i in 1:nrow(Intronlist)){

  #Get the strand the intron is located at
  strand <- Intronlist$Strand[i]

  #Get chromosome the intron is located at
  chromosome <-  Intronlist$Chromosome.scaffold.name[i]

  #Get Intron border coordinates
  upstream_end <- Intronlist$intron_upstream_end_cooordinate[i]
  downstream_end <- Intronlist$intron_downstream_end_cooordinate[i]

  #Get the respective genomic sequence (for introns of the minus strand, the upstrteam 
  test2 <- Views(DNAstringSet_of_Genome[[as.character(chromosome)]], start=upstream_end , end=downstream_end )

  #For introns of the minus strand, the coordinates have to be switched, since the VIEWS function only handels end coordinates, which are higher than the start coordinates
  if(strand == -1)  test2 <- Views((DNAstringSet_of_Genome[[as.character(chromosome)]], start=downstream_end, end=upstream_end)

  #Get the sequence from the Views object
  test <- as.character(test2)
  sequence_range <-  as.character(test2)
  test3 <- data.frame(test2@ranges)

  #Views extracts the sequence on the plus strand, so for minus strand introns, replace the intron sequence with the reverse complement
  sequence_range[strand=="-1"] <- gsub("A","t",sequence_range[strand=="-1"])
  sequence_range[strand=="-1"] <- gsub("C","g",sequence_range[strand=="-1"])
  sequence_range[strand=="-1"] <- gsub("T","a",sequence_range[strand=="-1"])
  sequence_range[strand=="-1"] <- gsub("G","c",sequence_range[strand=="-1"])
  sequence_range[strand=="-1"] <- toupper(sequence_range[strand=="-1"])
  sequence_range[strand=="-1"] <- reverse(sequence_range[strand=="-1"])

  #Store the intron sequence
  Intronlist$intron_seq[i] <- sequence_range

}
ADD REPLY
0
Entering edit mode

Thank you! Works well after adapting to my case.

ADD REPLY

Login before adding your answer.

Traffic: 1937 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6