Entering edit mode
9.3 years ago
monica_terrao
•
0
Hi.. I have generated a .bam file from my reads of a MiSeq analysis. There I have results from a DNA library. I want to count the reads for each gene of a GFF file, but I just want the reads that are in frame with the CDS. Someone could help me with it?
Thanks. Monica.
You mean strand? Frame is related to translation, there is no translation here, and therefore reads don't have a frame.
In which respect would knowing the strand be beneficial given there is no strand specific DNA library?
Are you really sure that you want that? It's unclear what it would even mean for a read to be in frame. I could only guess that you only want alignments whose 5' or 3'-most position happens to be on base 1 of an in-frame codon.
Devon Ryan, yes.. it is what I need.
What I need is to know with of the reads starts in the correct frame that the gene is translated. I just need the reads that are in frame with the gene.
And you're aware that the frame a read happens to align to is typically random? Michael and I are getting at the exact same question here, namely "why do you want this?". Even if you think you want this information, it's really unlikely that you do unless you're just trying to show that there's no systematic bias in coverage (i.e., you'd expect 1/6th of alignments covering a given position to be "in frame").
Exactly, trying to prove a bias is the only application I can imagine, another possibility: we are talking about a different type of library, say RNA-seq or ChIP-seq, but not genomic DNA. But: even then, the offset of a read with respect to the CDS should be completely random and has therefore no meaning. (Possibly you could use it to harvest entropy;)
Yeah, this would make the most sense in a RIP-seq context when combined with an RNase (presuming whatever is pulled down might care about codons).
Or ribosome-sequencing...
I suppose what Devon (and I) might have wanted to say politely is that it is very unlike that you really need this information, because - at least I - do not understand what this analysis is good for if the topic is a little ill defined. So please define what you mean by frame of a read.
Oh.. ok.. may be I need to explain better.
Well.. I have made a yeast two hybrid library using fragmented genomic DNA and I have used MiSeq Illumina to sequence the "positive" clones. As in this library I have the total genome, I need now to separate which reads are from CDSs and which are in frame, after all, if the read are not in frame with the protein on the genome to be sure that I have the interaction coming from a protein that is real. Just in this case I can count the read.
that reminds me my youth :-D http://www.ncbi.nlm.nih.gov/pubmed/9682060 (I created this for Y2H ). I'm not sure I understand your experiment: don't you have to detect the junction between the DNA from your library and the plasmid ?
Are the proteins you're using in your Y2H experiment known to bind to only the ends of a fragment? Otherwise, when you sequence the fragments the start position of the alignments will bear no relevance to whether the proteins bound to something that would eventually become in-frame (noting that that's not really a meaningful term in the context of DNA).
Pierre Lindenbaum,
Yes.. I have detected the junction of my library and the plasmid. The library was amplified by PCR using adapters in the vector/plasmid. In this case, I filtered just the reads that has this adapters and the start of the read is the start of the insert in the plasmid.
Devon Ryan,
My protein can bind any part of the "protein/fragment" that are in the Y2H library, I have all the proteins of my organism in the library, I don't know which are the proteins interacting. This is a initial screening. After this I will analyze better the interactions between the proteins.
you should post somewhere a few ~1000 reads at the junction plasmid/insert so we can have a look. The associated GTF/GFF would be nice too.
Has anyone done the analysis the same way? Can you give a cite for this approach? Can anyone produce a reference where the offset of the read has some meaning?
Technically, it is easy BTW to determine the offset between the CDS and the aligned reads, though I still doubt this makes sense? Just check: align all reads against CDS, then:
Then, simply check if in.frame is enriched, result will most likely be not significant. Note: if readlength is not a multiple of 3, then the probability of in.frame is 2/3.
It actually gets a bit more annoying than that when splicing is accounted for, since exon bounds needn't be in frame. At least Ensembl will annotate exons with the cds frame, so that could be used to facilitate the calculation.
We should use the CDS only in this case, to make it simple.