Question

How get counts that are in frame with a gene

0

Entering edit mode

9.3 years ago

monica_terrao • 0

Hi.. I have generated a .bam file from my reads of a MiSeq analysis. There I have results from a DNA library. I want to count the reads for each gene of a GFF file, but I just want the reads that are in frame with the CDS. Someone could help me with it?

Thanks. Monica.

DNA Seq • 2.6k views

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by monica_terrao • 0

2

Entering edit mode

You mean strand? Frame is related to translation, there is no translation here, and therefore reads don't have a frame.

In which respect would knowing the strand be beneficial given there is no strand specific DNA library?

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by Michael 54k

1

Entering edit mode

Are you really sure that you want that? It's unclear what it would even mean for a read to be in frame. I could only guess that you only want alignments whose 5' or 3'-most position happens to be on base 1 of an in-frame codon.

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by Devon Ryan 104k

0

Entering edit mode

Devon Ryan, yes.. it is what I need.

What I need is to know with of the reads starts in the correct frame that the gene is translated. I just need the reads that are in frame with the gene.

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by monica_terrao • 0

1

Entering edit mode

And you're aware that the frame a read happens to align to is typically random? Michael and I are getting at the exact same question here, namely "why do you want this?". Even if you think you want this information, it's really unlikely that you do unless you're just trying to show that there's no systematic bias in coverage (i.e., you'd expect 1/6th of alignments covering a given position to be "in frame").

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by Devon Ryan 104k

1

Entering edit mode

Exactly, trying to prove a bias is the only application I can imagine, another possibility: we are talking about a different type of library, say RNA-seq or ChIP-seq, but not genomic DNA. But: even then, the offset of a read with respect to the CDS should be completely random and has therefore no meaning. (Possibly you could use it to harvest entropy;)

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by Michael 54k

0

Entering edit mode

Yeah, this would make the most sense in a RIP-seq context when combined with an RNase (presuming whatever is pulled down might care about codons).

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by Devon Ryan 104k

0

Entering edit mode

Or ribosome-sequencing...

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by Michael 54k

1

Entering edit mode

I suppose what Devon (and I) might have wanted to say politely is that it is very unlike that you really need this information, because - at least I - do not understand what this analysis is good for if the topic is a little ill defined. So please define what you mean by frame of a read.

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by Michael 54k

0

Entering edit mode

Oh.. ok.. may be I need to explain better.

Well.. I have made a yeast two hybrid library using fragmented genomic DNA and I have used MiSeq Illumina to sequence the "positive" clones. As in this library I have the total genome, I need now to separate which reads are from CDSs and which are in frame, after all, if the read are not in frame with the protein on the genome to be sure that I have the interaction coming from a protein that is real. Just in this case I can count the read.

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by monica_terrao • 0

1

Entering edit mode

that reminds me my youth :-D http://www.ncbi.nlm.nih.gov/pubmed/9682060 (I created this for Y2H ). I'm not sure I understand your experiment: don't you have to detect the junction between the DNA from your library and the plasmid ?

ADD REPLY • link 9.3 years ago by Pierre Lindenbaum 162k

0

Entering edit mode

Are the proteins you're using in your Y2H experiment known to bind to only the ends of a fragment? Otherwise, when you sequence the fragments the start position of the alignments will bear no relevance to whether the proteins bound to something that would eventually become in-frame (noting that that's not really a meaningful term in the context of DNA).

ADD REPLY • link 9.3 years ago by Devon Ryan 104k

0

Entering edit mode

Pierre Lindenbaum,

Yes.. I have detected the junction of my library and the plasmid. The library was amplified by PCR using adapters in the vector/plasmid. In this case, I filtered just the reads that has this adapters and the start of the read is the start of the insert in the plasmid.

Devon Ryan,

My protein can bind any part of the "protein/fragment" that are in the Y2H library, I have all the proteins of my organism in the library, I don't know which are the proteins interacting. This is a initial screening. After this I will analyze better the interactions between the proteins.

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by monica_terrao • 0

0

Entering edit mode

you should post somewhere a few ~1000 reads at the junction plasmid/insert so we can have a look. The associated GTF/GFF would be nice too.

ADD REPLY • link 9.3 years ago by Pierre Lindenbaum 162k

0

Entering edit mode

Has anyone done the analysis the same way? Can you give a cite for this approach? Can anyone produce a reference where the offset of the read has some meaning?

Technically, it is easy BTW to determine the offset between the CDS and the aligned reads, though I still doubt this makes sense? Just check: align all reads against CDS, then:

in.frame <- (cds.start-aligned.start) %% 3 == 0 | (cds.start-aligned.end) %% 3 == 0

Then, simply check if in.frame is enriched, result will most likely be not significant. Note: if readlength is not a multiple of 3, then the probability of in.frame is 2/3.

ADD REPLY • link 9.3 years ago by Michael 54k

0

Entering edit mode

It actually gets a bit more annoying than that when splicing is accounted for, since exon bounds needn't be in frame. At least Ensembl will annotate exons with the cds frame, so that could be used to facilitate the calculation.

ADD REPLY • link 9.3 years ago by Devon Ryan 104k

0

Entering edit mode

We should use the CDS only in this case, to make it simple.

ADD REPLY • link 9.3 years ago by Michael 54k

Ram · Answer 1 · 2015-02-20

0

Entering edit mode

9.3 years ago

Michael 54k

While one can easily calculate the offset and a 'pseudo-in-frame' condition based on reads aligned to all CDS. Unless otherwise proven to be an accepted method in a paper or protocol, my 'final conclusion' is that this analysis is not informative.

ADD COMMENT • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by Michael 54k

0

Entering edit mode

Thanks Michael..

I think I will try to analyze the data in a different way.

ADD REPLY • link updated 2.1 years ago by Ram 44k • written 9.3 years ago by monica_terrao • 0