Question: Remove some IDs from IDs list
0
gravatar for Janey
10 weeks ago by
Janey10
USA
Janey10 wrote:

Hi

I have 40000 IDs in txt file related to redundant contigs . I want to remove these IDs from main IDs list in txt file. like these:

I have two txt files.

1- Main IDs (150000 IDs) txt file

2- IDs that require to remove from main IDs list (40000 IDs) txt file

Thanks for your help

rna-seq • 286 views
ADD COMMENTlink modified 10 weeks ago by cpad01128.3k • written 10 weeks ago by Janey10
2
gravatar for finswimmer
10 weeks ago by
finswimmer4.4k
Germany
finswimmer4.4k wrote:

Hello Janey,

cat main.txt remove.txt|sort|uniq -u

should do it.

fin swimmer

ADD COMMENTlink written 10 weeks ago by finswimmer4.4k

Hi finswimmer

Your command worked great

Thanks

ADD REPLYlink written 10 weeks ago by Janey10

Fine if this was helpful.

You should upvote posts that you find helpful and/or mark those as accepted which solve your problem. shenwei356 solution together with cpad0112 comment work's also fine and is probably the faster method. So you should upvote those posts as well.

enter image description here

Now we can go on investigating your initial problem?

fin swimmer

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by finswimmer4.4k

Hi finswimmer

Because my unix system is very specific, so among all the suggestions of my previous posts, your command was only useful. It was completely compatible with my unix system. Can you suggest the same kind of command for filtering sequences by IDs from fasta files.

thanks

ADD REPLYlink written 9 weeks ago by Janey10

Hello Janey,

what is that special in your unix system?

You already have started a thread about FASTA filtering. Have you tried what cpad0112 suggested? Did you have a look at the links Pierre posted?

If nothing of this helps please answer in that thread what is meant by "doesn't work". This helps to keep the focus of the discussion to the corresponding thread title.

fin swimmer

ADD REPLYlink written 8 weeks ago by finswimmer4.4k
4
gravatar for shenwei356
10 weeks ago by
shenwei3564.1k
China
shenwei3564.1k wrote:

You need learn some basic shell commands, like grep.

You need this for this post:

grep -v -f small.txt big.txt > big-small.txt

And what's the most important, try provide enough information when asking problem. Providing some sample data can save you and us a lot of time.

I guess you are continue trying to solve the same problem posted in Fasta file filtering and Remove sequence from fasta file by samtools . Please post several lines of your sequence and ID list files.

ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by shenwei3564.1k
2

Taking one day to learn some shell knowledge can save you much time.

Finding Things: http://swcarpentry.github.io/shell-novice/07-find/index.html

ADD REPLYlink written 10 weeks ago by shenwei3564.1k
1

@Janey: please add -w to be on safe side.

ADD REPLYlink written 10 weeks ago by cpad01128.3k

Yes, your solution is more beautiful than mine :)

ADD REPLYlink written 10 weeks ago by finswimmer4.4k

Not so sure. grep -f can be very very slow. This can be easily done with cat or awk

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Kevin Blighe26k
1
gravatar for cpad0112
10 weeks ago by
cpad01128.3k
India
cpad01128.3k wrote:

One can use 'join' and 'comm'. -v option in join gives incomparable items from the column of choosen file. In example, files are sorted. in general, it is better to join on sorted files.

For eg.

input long_ids.txt:

$ cat long_ids.txt 
a
b
c
d
e
f
g
h

input short_ids.txt:

$ cat short_ids.txt 
a
b
c
d

output:

$ join -v 1  -1 1 -2 1 long_ids.txt short_ids.txt 
e
f
g
h
ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by cpad01128.3k

I think diff would be the best tool for this case

ADD REPLYlink written 10 weeks ago by 5heikki7.5k

There is an unlimited number of ways of doing it.

ADD REPLYlink written 9 weeks ago by Kevin Blighe26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1288 users visited in the last hour