Question: (Closed) extract headers of a file using a second file containing a list of IDs
0
gravatar for paraskevopou
22 months ago by
paraskevopou20
paraskevopou20 wrote:

Dear people! Sorry my question might be very trivial but I am very new in the bioinformatic field. I have a txt file containing the headers of a fasta file (file 1, 14000 headers) and a txt file, file2, with the IDs I want to extract. My problem is that this second txt file has only the TRINITY_ID (without the coming information and without the >) and less entries than file1. Here comes the question. How can extract all the information from the header (everything that comes after >) from file1 only for those that are present in file2 ?

file1 (total file can be found here: https://www.dropbox.com/sh/dt09ij88052epr9/AAAB9A1k20dHs6Ktc-pSEt6qa?dl=0)

>TRINITY_DN10008_c0_g1.p1 GENE.TRINITY_DN10008_c0_g1~~TRINITY_DN10008_c0_g1.p1  ORF type:complete len:404 (+),score=64.53,WDFY2_HUMAN|46.898|2.25e-148 TRINITY_DN10008_c0_g1:212-1423(+)
>TRINITY_DN10008_c0_g2.p1 GENE.TRINITY_DN10008_c0_g2~~TRINITY_DN10008_c0_g2.p1  ORF type:5prime_partial len:359 (+),score=54.01,WDFY2_HUMAN|48.045|4.13e-137 TRINITY_DN10008_c0_g2:3-1079(+)
>TRINITY_DN10009_c0_g1.p1 GENE.TRINITY_DN10009_c0_g1~~TRINITY_DN10009_c0_g1.p1  ORF type:complete len:996 (+),score=231.51,EXOC4_HUMAN|27.089|1.79e-91 TRINITY_DN10009_c0_g1:26-3013(+)
>TRINITY_DN1000_c0_g1.p1 GENE.TRINITY_DN1000_c0_g1~~TRINITY_DN1000_c0_g1.p1  ORF type:5prime_partial len:185 (+),score=16.01,ASI4B_DANRE|30.657|1.37e-12 TRINITY_DN1000_c0_g1:1-555(+)

file2 (total file can be found here: https://www.dropbox.com/sh/dt09ij88052epr9/AAAB9A1k20dHs6Ktc-pSEt6qa?dl=0)

TRINITY_DN10008_c0_g1.p1 
TRINITY_DN10008_c0_g2.p1

Thanks a lot in advance for any help!

rna-seq • 374 views
ADD COMMENTlink written 22 months ago by paraskevopou20

Hello paraskevopou!

We believe that this post does not fit the main topic of this site.

This has been addressed a number of times. Please search the site, or google filter fasta + site:biostars.org (better option)

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink modified 22 months ago • written 22 months ago by RamRS27k

Examples from a recent thread: C: How do I extract Fasta Sequences based on a list of IDs?

ADD REPLYlink written 22 months ago by genomax83k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1039 users visited in the last hour