Hello! So, first of all, I have already read and tried several solutions exposed on Biostars and other foruns for my issue, but none of them seem to work. The ones I have most invested my time are faSomeRecords and SeqFilter. The problems I had with them are purely technical (python version, several modules needed, pipeline dont work or works the wrong way because the inpired version author forgot to put some extra line codes) and because of my lack of knowledge to solve them, i gave up
I have several limitations because i'm a newbie with a short deadline and I use my university server, so a not very complicated and quick solution would be nice.
So......... Here's what I have to do:
I have a fasta file with several transcripts sequences
>TCONS_00002379 gene=ENSBTAG00000006648 ATGAATTGCAGCACGCCAGGCCTCCCTGTCCATCACCAACTCCTGGAGTTCACCCAGACTCACATCCATC....... >TCONS_00000007 gene=RCAN1 GACAGCTTCTGGTAAAGGAACTCCATCCACTTGGGGCTCGACTGCGGGAGTCGCTGTAGCTCTCACTGCC.....
And I have a list with the IDs of transcripts that i dont want my output to have, so i want the ID's sequences out of my output file :
TCONS_00002379 gene=ENSBTAG00000006648 TCONS_00000007 gene=RCAN1 TCONS_00002389 gene=RCAN1 TCONS_00002405 gene=ITSN1 TCONS_00002406 gene=ITSN1 TCONS_00002407 gene=ITSN1 TCONS_00002408 gene=ITSN1 TCONS_00002409 gene=ITSN1 TCONS_00002410 gene=ITSN1 TCONS_00002411 gene=ITSN1 TCONS_00002412 gene=ITSN1 TCONS_00002413 gene=ITSN1 TCONS_00000015 gene=CRYZL1
Currently im trying to run a pipeline with seqtk which I got from a post here, but its taking a long time, and i dont even know if its going to work:
seqtk subseq results_15m_200.fa $(grep ">" results_15m_200.fa | tr -d ">" | grep -v -w -f transcripts.with.orf_15m_full.name.txt) > finalfile_without_orf_15m.fa
The solution you'll give me can also be on R!!
Thanks in advance