RNAseq: Modify .fasta and .gtf files to add recombinant protein sequence
1
0
Entering edit mode
21 months ago
Pavlos • 0

Hi everyone!

The dataset that I am working with comes from a producer cell line that expresses a recombinant protein. Therefore, I was wondering how I could modify the reference .fasta and .gtf files in order to include the sequence of the recombinant molecule. Are there any convenient functions/packages for that job in either python/R/terminal? Is there anything that I need to be careful about? A potential candidate that I have found is reform1; does anyone have experience with this package?

reference protein modify recombinant • 1.3k views
1
Entering edit mode
21 months ago
Carambakaracho ★ 3.1k

Given you determined the insertion with high precision, you can simply add the sequence to the genome. Maybe there's some biopython function, but I can't find it - however, in case you're confident enough in your counting skills you can do it with almost any text editor that is powerful enough to handle you're genome. The gff is then just renumbering the elements after the insertion, + adding your annotation with the coordinates + the insertion site.

In theory, it's super simple - in practice it might be tedious and a bit error prone. I would certainly recommend to realign some sequencing data to confirm you placed the insertion at the right spot and of course verify the annotation with a genome viewer like IGV.

P.S. A long, long time ago, people around me used Staden's Gap5 to do something similar Staden package manual (2011) Brabraham course - I never managed to get used to it and chances are fairly high there's something better/newer.

0
Entering edit mode

Given you determined the insertion with high precision,

Where does OP say that? I don't know industrial terminology but is that implied by the following sentence?

FROM Original post: from a producer cell line that expresses a recombinant protein

0
Entering edit mode

Not at all, that's why I made it a precondition. There's at least one method that allow for that and the company selling it became pretty popular in the industry.

In case OP doesn't know where the production vector inserted precisely, I don't know how useful the efforts are, but that's up to OP to decide. AFAIK companies prepare for future questions by regulatory authorities with regards to proof cell line stability was unaffected, so they would need to know whether a gene was disrupted and if so which gene and where.

0
Entering edit mode

However, I guess I extrapolated too much from the familiar terms which don't pop up so often here - all of my answer applies to DNAseq based methods. I don't know how OP determined the insertion site based on RNAseq.

0
Entering edit mode

Thanks for your detailed reply. As you guessed I have not confirmed the insertion site of the sequence. However, from this conversation discussion 1 I understand that it should be ok to add the sequence as an extra chromosome. What do you think? I have to admit that I am having major problems with editing the .gtf file to add the extra line as suggested in the link that I have attached. I cannot find an editor that would open the file. I will also definitely check the Staden package.

0
Entering edit mode

As additional chromosome, you don't need to modify the existing sequence, so I don't see a need for Staden or reform (btw, the link is barely visible in your original post).

You can append your sequence to the fasta sequence you use and the annotation lines to the gtf/gff. Just make sure the sequence identifiers match. You don't need to recalculate any coordinates, as you keep it separate.

0
Entering edit mode

Thanks, it's good to know that it can be done this way. I am currently getting some errors in the FeatureCounts step so I might be back with more questions...

0
Entering edit mode

GTF file should be plain text so as long as you use an editor like (NotePad++ or similar) you should be able to edit the file. Be sure to keep the proper format (columns, tab separators etc) when you add the extra entry for "insert".

0
Entering edit mode

Yes, indeed, it was easier than I thought when I used vim to append the new lines.