new file with line in common
2
0
Entering edit mode
4.8 years ago

Hello, I have a first file containing a lot of lines.

9   141016262   rs2278973   T   G   .   PASS    ENSG00000148408;ENST00000277551|ENST00000277549 GT  T|G G   G   T|G G   G   G   G   G   G   G   G   T|G G|G G   G   G   G   G   G   T   T|G G   T|G G   T|G G   T|G G   T|G G   G   G   T|G T|G G   G   G   G   G
9   141016271   rs201383337 C   T   .   PASS    ENSG00000148408;ENST00000277551|ENST00000277549 GT  C   C   C   C   C   C   C   C   C   C   C   C   C   C|C C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C
9   141016441   rs150679456 A   G   .   PASS    ENSG00000148408;ENST00000371372|ENST00000371363|ENST00000371357|ENST00000371355 GT  A   A   A   A   A   A   A   A   A   A   A   A   A   A|A A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A
10  225960  rs370081585 G   A   .   PASS    ENSG00000015171;ENST00000439456|ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000397955|ENST00000558098|ENST00000381607|ENST00000397959|ENST00000309776 GT  G   G   G   G|G G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G
10  292763  rs150625727 C   T   .   PASS    ENSG00000015171;ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000381584|ENST00000558098|ENST00000627286|ENST00000381607|ENST00000397959|ENST00000309776|ENST00000381604 GT  C   C   C   C|C C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C
10  294892  rs781271016 T   C   .   PASS    ENSG00000015171;ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000381584|ENST00000558098|ENST00000627286|ENST00000381607|ENST00000397959|ENST00000309776|ENST00000381604 GT  T   T   T   T|T T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T
10  327162  rs142438404 C   T   .   PASS    ENSG00000151240;ENST00000280886|ENST00000634311|ENST00000381496 GT  C   C   C   C|C C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C

However I would like to keep only the lines so the identifier is in my second file.

rs370081585
rs150625727

desired result:

10  225960  rs370081585 G   A   .   PASS    ENSG00000015171;ENST00000439456|ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000397955|ENST00000558098|ENST00000381607|ENST00000397959|ENST00000309776 GT  G   G   G   G|G G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G
10  292763  rs150625727 C   T   .   PASS    ENSG00000015171;ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000381584|ENST00000558098|ENST00000627286|ENST00000381607|ENST00000397959|ENST00000309776|ENST00000381604 GT  C   C   C   C|C C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C

code tried:

while read line; do awk '/$line/ { print $0 }' vcf.vcf; done < rs.txt

thank you

bash awk • 1.1k views
ADD COMMENT
0
Entering edit mode

A very big thank you, I couldn't get away with it anymore. Thank you!

ADD REPLY
0
Entering edit mode

I have moved the comments pointing you to the right solution to a "Answer", so you can mark them as accepted and as such indicate this thread is solved.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLY
5
Entering edit mode
4.8 years ago
GenoMax 141k

Take a look at inline help for grep. Especially the -w and -f (a file with things you want to search for) options.

If you are filtering a VCF file then use tools meant for managing VCF files such as bcftools and vcftools.

ADD COMMENT
0
Entering edit mode

+1 on using bcftools. You awk will benefit greatly by matching just one column ($3) instead of the entire line. But definitely go for bcftools view

ADD REPLY
3
Entering edit mode
4.8 years ago

You can try this where one will contain list of ids only and other file having all data including ids

grep -Fwf List_of_ids all_data
ADD COMMENT
0
Entering edit mode

However, it only looks for the last line of the List_of_ids file in the all_data file and not the whole list...

ADD REPLY
0
Entering edit mode

No, it shouldn't do that, it should work. Did you make the List_of_ids on Windows? If so, use dos2unix on it, as the line endings might be different.

ADD REPLY
0
Entering edit mode

Thank you! This is exactly the solution to my problem: an encoding problem!

ADD REPLY

Login before adding your answer.

Traffic: 2632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6