new file with line in common
2
0
Entering edit mode
21 months ago

Hello, I have a first file containing a lot of lines.

9   141016262   rs2278973   T   G   .   PASS    ENSG00000148408;ENST00000277551|ENST00000277549 GT  T|G G   G   T|G G   G   G   G   G   G   G   G   T|G G|G G   G   G   G   G   G   T   T|G G   T|G G   T|G G   T|G G   T|G G   G   G   T|G T|G G   G   G   G   G
9   141016271   rs201383337 C   T   .   PASS    ENSG00000148408;ENST00000277551|ENST00000277549 GT  C   C   C   C   C   C   C   C   C   C   C   C   C   C|C C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C
9   141016441   rs150679456 A   G   .   PASS    ENSG00000148408;ENST00000371372|ENST00000371363|ENST00000371357|ENST00000371355 GT  A   A   A   A   A   A   A   A   A   A   A   A   A   A|A A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A
10  225960  rs370081585 G   A   .   PASS    ENSG00000015171;ENST00000439456|ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000397955|ENST00000558098|ENST00000381607|ENST00000397959|ENST00000309776 GT  G   G   G   G|G G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G
10  292763  rs150625727 C   T   .   PASS    ENSG00000015171;ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000381584|ENST00000558098|ENST00000627286|ENST00000381607|ENST00000397959|ENST00000309776|ENST00000381604 GT  C   C   C   C|C C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C
10  294892  rs781271016 T   C   .   PASS    ENSG00000015171;ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000381584|ENST00000558098|ENST00000627286|ENST00000381607|ENST00000397959|ENST00000309776|ENST00000381604 GT  T   T   T   T|T T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T
10  327162  rs142438404 C   T   .   PASS    ENSG00000151240;ENST00000280886|ENST00000634311|ENST00000381496 GT  C   C   C   C|C C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C


However I would like to keep only the lines so the identifier is in my second file.

rs370081585
rs150625727


desired result:

10  225960  rs370081585 G   A   .   PASS    ENSG00000015171;ENST00000439456|ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000397955|ENST00000558098|ENST00000381607|ENST00000397959|ENST00000309776 GT  G   G   G   G|G G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G
10  292763  rs150625727 C   T   .   PASS    ENSG00000015171;ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000381584|ENST00000558098|ENST00000627286|ENST00000381607|ENST00000397959|ENST00000309776|ENST00000381604 GT  C   C   C   C|C C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C


code tried:

while read line; do awk '/$line/ { print$0 }' vcf.vcf; done < rs.txt


thank you

bash awk • 433 views
0
Entering edit mode

A very big thank you, I couldn't get away with it anymore. Thank you!

0
Entering edit mode

I have moved the comments pointing you to the right solution to a "Answer", so you can mark them as accepted and as such indicate this thread is solved.

5
Entering edit mode
21 months ago
GenoMax 99k

Take a look at inline help for grep. Especially the -w and -f (a file with things you want to search for) options.

If you are filtering a VCF file then use tools meant for managing VCF files such as bcftools and vcftools.

0
Entering edit mode

+1 on using bcftools. You awk will benefit greatly by matching just one column (\$3) instead of the entire line. But definitely go for bcftools view

3
Entering edit mode
21 months ago

You can try this where one will contain list of ids only and other file having all data including ids

grep -Fwf List_of_ids all_data

0
Entering edit mode

However, it only looks for the last line of the List_of_ids file in the all_data file and not the whole list...

0
Entering edit mode

No, it shouldn't do that, it should work. Did you make the List_of_ids on Windows? If so, use dos2unix on it, as the line endings might be different.

0
Entering edit mode

Thank you! This is exactly the solution to my problem: an encoding problem!