Question

k-mer enrichment analysis

0

Entering edit mode

6.2 years ago

yuabrahamliu ▴ 60

Hi all, I have 2 groups of RNAs, one is the experimental group, the other is the control group. What I need to do is to check if there is some k-mer sequences highly enriched in the experimental group compared to the control group, or vice versa. My question is that,

1) How can I get the RNA sequences from coordinates. I think maybe it is right to use bedtools to extract the FASTA sequences of each exons respectively and then combine them together, and then convert the 'T' in DNA to 'U' in RNA, but is there any method more efficient?

2) Actually I don't know any tools can do such a k-mer comparison between different RNA groups. Could anyone recommend some tools? Thank you so much.

RNA-Seq • 2.9k views

ADD COMMENT • link updated 6.2 years ago by Devon Ryan 104k • written 6.2 years ago by yuabrahamliu ▴ 60

0

Entering edit mode

What is your end goal in the comparison?

ADD REPLY • link 6.2 years ago by Devon Ryan 104k

0

Entering edit mode

An RNA binding protein can bind to the experimental group while not to the control group. So want to see if any k-mer sequence is enriched in the experimental group. Maybe it is a kind of bind motif of the protein.

ADD REPLY • link 6.2 years ago by yuabrahamliu ▴ 60

score 0 · Answer 1 · 2018-02-07

0

Entering edit mode

6.2 years ago

Devon Ryan 104k

The obvious choice would be to run khmer on the fastq files from both datasets to quantify k-mer abundance and then load that information into R, where one would basically treat the data as if it were RNAseq.

Having said that, at least with the protocol people are using here, I can see the binding motif already in FastQC.

ADD COMMENT • link 6.2 years ago by Devon Ryan 104k

0

Entering edit mode

That's a good idea. But the problem is that I have the exact gene name list of the mRNAs able to bind the protein, but don 't have fastq files. So I want to do the kmer analysis on the FASTA sequence of the binding fragments of the mRNAs. I have checked khmer, but it seems it only works on sequencing data, not FASTA sequences. Is there any method to perform such things on FASTA sequences?

ADD REPLY • link 6.2 years ago by yuabrahamliu ▴ 60

0

Entering edit mode

I'm not sure how useful that'll be unless the gene list is quite long, as there will be a LOT of spurious hits. The term you need to google for is "motif search".

ADD REPLY • link 6.2 years ago by Devon Ryan 104k