Question: gene's ID extraction
0
gravatar for paraskevopou
19 months ago by
paraskevopou20
paraskevopou20 wrote:

Dear people, I have 2 txt files, file1.txt with long ID's names (e.g TRINITY_DN14306_c0_g3_i4) and file2.txt with the ID's of my interest, however the isoform information is missing (e.g. TRINITY_DN14306_c0_g3). File1 has 40000 records while file2 has 5000. I would like to extract these 5000 from file1 along with the isoform information. I used the following command but the output I get is empty.

while read line; do grep -e "${line}_" file1.txt; done < file2.txt > out.txt

Any suggestions will be helpful. Thanks a lot in advance! Sofia

rna-seq • 439 views
ADD COMMENTlink modified 19 months ago • written 19 months ago by paraskevopou20

please, validate or comment your previous questions:

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLYlink written 19 months ago by Pierre Lindenbaum124k

however the isoform information is missing (e.g. TRINITY_DN14306_c0_g3)

please, provide a sample of each files.

ADD REPLYlink written 19 months ago by Pierre Lindenbaum124k

Thanks a lot for the comments. Here are mi files. I want to extract only the names that are present on file2.txt from file1.txt. But in file1.txt there is the isoform information (_i*) which I also need in my output file.

file1.txt

TRINITY_DN12874_c0_g1_i1
TRINITY_DN12795_c0_g1_i2
TRINITY_DN12248_c0_g1_i1
TRINITY_DN12868_c0_g1_i1
TRINITY_DN12866_c0_g1_i1
TRINITY_DN12817_c1_g1_i1
TRINITY_DN12775_c1_g2_i2
TRINITY_DN12829_c0_g1_i1
TRINITY_DN12736_c0_g1_i1
TRINITY_DN12865_c0_g1_i1

file2.txt

TRINITY_DN12874_c0_g1
TRINITY_DN12248_c0_g1
TRINITY_DN12866_c0_g1
TRINITY_DN12817_c1_g1
ADD REPLYlink modified 19 months ago • written 19 months ago by paraskevopou20

-e is for multiple pattern matching, while you have one per iteration. But I am not sure if this could matter.

ADD REPLYlink modified 19 months ago • written 19 months ago by grant.hovhannisyan1.8k
1

It is helpful if you have extended regular expressions, such as capture groups or even character classes in certain instances. I've rarely ever gone wrong with using an -e when it's not really needed. I've found that more often than not, expected behavior is seen with -e (or -P) than without. Plus, it's easier to build on.

ADD REPLYlink modified 19 months ago • written 19 months ago by RamRS24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1243 users visited in the last hour