Question: new file with line in common
0
gravatar for amandinelecerfdefer
13 days ago by
amandinelecerfdefer20 wrote:

Hello, I have a first file containing a lot of lines.

9   141016262   rs2278973   T   G   .   PASS    ENSG00000148408;ENST00000277551|ENST00000277549 GT  T|G G   G   T|G G   G   G   G   G   G   G   G   T|G G|G G   G   G   G   G   G   T   T|G G   T|G G   T|G G   T|G G   T|G G   G   G   T|G T|G G   G   G   G   G
9   141016271   rs201383337 C   T   .   PASS    ENSG00000148408;ENST00000277551|ENST00000277549 GT  C   C   C   C   C   C   C   C   C   C   C   C   C   C|C C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C
9   141016441   rs150679456 A   G   .   PASS    ENSG00000148408;ENST00000371372|ENST00000371363|ENST00000371357|ENST00000371355 GT  A   A   A   A   A   A   A   A   A   A   A   A   A   A|A A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A
10  225960  rs370081585 G   A   .   PASS    ENSG00000015171;ENST00000439456|ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000397955|ENST00000558098|ENST00000381607|ENST00000397959|ENST00000309776 GT  G   G   G   G|G G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G
10  292763  rs150625727 C   T   .   PASS    ENSG00000015171;ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000381584|ENST00000558098|ENST00000627286|ENST00000381607|ENST00000397959|ENST00000309776|ENST00000381604 GT  C   C   C   C|C C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C
10  294892  rs781271016 T   C   .   PASS    ENSG00000015171;ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000381584|ENST00000558098|ENST00000627286|ENST00000381607|ENST00000397959|ENST00000309776|ENST00000381604 GT  T   T   T   T|T T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T
10  327162  rs142438404 C   T   .   PASS    ENSG00000151240;ENST00000280886|ENST00000634311|ENST00000381496 GT  C   C   C   C|C C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C

However I would like to keep only the lines so the identifier is in my second file.

rs370081585
rs150625727

desired result:

10  225960  rs370081585 G   A   .   PASS    ENSG00000015171;ENST00000439456|ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000397955|ENST00000558098|ENST00000381607|ENST00000397959|ENST00000309776 GT  G   G   G   G|G G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G
10  292763  rs150625727 C   T   .   PASS    ENSG00000015171;ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000381584|ENST00000558098|ENST00000627286|ENST00000381607|ENST00000397959|ENST00000309776|ENST00000381604 GT  C   C   C   C|C C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C

code tried:

while read line; do awk '/$line/ { print $0 }' vcf.vcf; done < rs.txt

thank you

awk bash • 149 views
ADD COMMENTlink modified 12 days ago • written 13 days ago by amandinelecerfdefer20

A very big thank you, I couldn't get away with it anymore. Thank you!

ADD REPLYlink written 12 days ago by amandinelecerfdefer20

I have moved the comments pointing you to the right solution to a "Answer", so you can mark them as accepted and as such indicate this thread is solved.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLYlink written 12 days ago by WouterDeCoster40k
5
gravatar for genomax
13 days ago by
genomax70k
United States
genomax70k wrote:

Take a look at inline help for grep. Especially the -w and -f (a file with things you want to search for) options.

If you are filtering a VCF file then use tools meant for managing VCF files such as bcftools and vcftools.

ADD COMMENTlink modified 13 days ago • written 13 days ago by genomax70k

+1 on using bcftools. You awk will benefit greatly by matching just one column ($3) instead of the entire line. But definitely go for bcftools view

ADD REPLYlink written 13 days ago by RamRS22k
3
gravatar for archana.bioinfo87
13 days ago by
archana.bioinfo87160 wrote:

You can try this where one will contain list of ids only and other file having all data including ids

grep -Fwf List_of_ids all_data
ADD COMMENTlink written 13 days ago by archana.bioinfo87160

However, it only looks for the last line of the List_of_ids file in the all_data file and not the whole list...

ADD REPLYlink written 12 days ago by amandinelecerfdefer20

No, it shouldn't do that, it should work. Did you make the List_of_ids on Windows? If so, use dos2unix on it, as the line endings might be different.

ADD REPLYlink written 12 days ago by WouterDeCoster40k

Thank you! This is exactly the solution to my problem: an encoding problem!

ADD REPLYlink written 12 days ago by amandinelecerfdefer20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2069 users visited in the last hour