Off topic:extract headers of a file using a second file containing a list of IDs
0
0
Entering edit mode
5.7 years ago
paraskevopou ▴ 20

Dear people! Sorry my question might be very trivial but I am very new in the bioinformatic field. I have a txt file containing the headers of a fasta file (file 1, 14000 headers) and a txt file, file2, with the IDs I want to extract. My problem is that this second txt file has only the TRINITY_ID (without the coming information and without the >) and less entries than file1. Here comes the question. How can extract all the information from the header (everything that comes after >) from file1 only for those that are present in file2 ?

file1 (total file can be found here: https://www.dropbox.com/sh/dt09ij88052epr9/AAAB9A1k20dHs6Ktc-pSEt6qa?dl=0)

>TRINITY_DN10008_c0_g1.p1 GENE.TRINITY_DN10008_c0_g1~~TRINITY_DN10008_c0_g1.p1  ORF type:complete len:404 (+),score=64.53,WDFY2_HUMAN|46.898|2.25e-148 TRINITY_DN10008_c0_g1:212-1423(+)
>TRINITY_DN10008_c0_g2.p1 GENE.TRINITY_DN10008_c0_g2~~TRINITY_DN10008_c0_g2.p1  ORF type:5prime_partial len:359 (+),score=54.01,WDFY2_HUMAN|48.045|4.13e-137 TRINITY_DN10008_c0_g2:3-1079(+)
>TRINITY_DN10009_c0_g1.p1 GENE.TRINITY_DN10009_c0_g1~~TRINITY_DN10009_c0_g1.p1  ORF type:complete len:996 (+),score=231.51,EXOC4_HUMAN|27.089|1.79e-91 TRINITY_DN10009_c0_g1:26-3013(+)
>TRINITY_DN1000_c0_g1.p1 GENE.TRINITY_DN1000_c0_g1~~TRINITY_DN1000_c0_g1.p1  ORF type:5prime_partial len:185 (+),score=16.01,ASI4B_DANRE|30.657|1.37e-12 TRINITY_DN1000_c0_g1:1-555(+)

file2 (total file can be found here: https://www.dropbox.com/sh/dt09ij88052epr9/AAAB9A1k20dHs6Ktc-pSEt6qa?dl=0)

TRINITY_DN10008_c0_g1.p1 
TRINITY_DN10008_c0_g2.p1

Thanks a lot in advance for any help!

RNA-Seq • 698 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2176 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6