Question: new file with line in common
0
gravatar for amandinelecerfdefer
8 months ago by
amandinelecerfdefer20 wrote:

Hello, I have a first file containing a lot of lines.

9   141016262   rs2278973   T   G   .   PASS    ENSG00000148408;ENST00000277551|ENST00000277549 GT  T|G G   G   T|G G   G   G   G   G   G   G   G   T|G G|G G   G   G   G   G   G   T   T|G G   T|G G   T|G G   T|G G   T|G G   G   G   T|G T|G G   G   G   G   G
9   141016271   rs201383337 C   T   .   PASS    ENSG00000148408;ENST00000277551|ENST00000277549 GT  C   C   C   C   C   C   C   C   C   C   C   C   C   C|C C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C
9   141016441   rs150679456 A   G   .   PASS    ENSG00000148408;ENST00000371372|ENST00000371363|ENST00000371357|ENST00000371355 GT  A   A   A   A   A   A   A   A   A   A   A   A   A   A|A A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A   A
10  225960  rs370081585 G   A   .   PASS    ENSG00000015171;ENST00000439456|ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000397955|ENST00000558098|ENST00000381607|ENST00000397959|ENST00000309776 GT  G   G   G   G|G G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G
10  292763  rs150625727 C   T   .   PASS    ENSG00000015171;ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000381584|ENST00000558098|ENST00000627286|ENST00000381607|ENST00000397959|ENST00000309776|ENST00000381604 GT  C   C   C   C|C C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C
10  294892  rs781271016 T   C   .   PASS    ENSG00000015171;ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000381584|ENST00000558098|ENST00000627286|ENST00000381607|ENST00000397959|ENST00000309776|ENST00000381604 GT  T   T   T   T|T T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T   T
10  327162  rs142438404 C   T   .   PASS    ENSG00000151240;ENST00000280886|ENST00000634311|ENST00000381496 GT  C   C   C   C|C C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C

However I would like to keep only the lines so the identifier is in my second file.

rs370081585
rs150625727

desired result:

10  225960  rs370081585 G   A   .   PASS    ENSG00000015171;ENST00000439456|ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000397955|ENST00000558098|ENST00000381607|ENST00000397959|ENST00000309776 GT  G   G   G   G|G G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G   G
10  292763  rs150625727 C   T   .   PASS    ENSG00000015171;ENST00000397962|ENST00000509513|ENST00000381591|ENST00000403354|ENST00000402736|ENST00000602682|ENST00000381584|ENST00000558098|ENST00000627286|ENST00000381607|ENST00000397959|ENST00000309776|ENST00000381604 GT  C   C   C   C|C C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C   C

code tried:

while read line; do awk '/$line/ { print $0 }' vcf.vcf; done < rs.txt

thank you

awk bash • 297 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by amandinelecerfdefer20

A very big thank you, I couldn't get away with it anymore. Thank you!

ADD REPLYlink written 8 months ago by amandinelecerfdefer20

I have moved the comments pointing you to the right solution to a "Answer", so you can mark them as accepted and as such indicate this thread is solved.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLYlink written 8 months ago by WouterDeCoster43k
5
gravatar for genomax
8 months ago by
genomax80k
United States
genomax80k wrote:

Take a look at inline help for grep. Especially the -w and -f (a file with things you want to search for) options.

If you are filtering a VCF file then use tools meant for managing VCF files such as bcftools and vcftools.

ADD COMMENTlink modified 8 months ago • written 8 months ago by genomax80k

+1 on using bcftools. You awk will benefit greatly by matching just one column ($3) instead of the entire line. But definitely go for bcftools view

ADD REPLYlink written 8 months ago by RamRS26k
3
gravatar for archana.bioinfo87
8 months ago by
archana.bioinfo87180 wrote:

You can try this where one will contain list of ids only and other file having all data including ids

grep -Fwf List_of_ids all_data
ADD COMMENTlink written 8 months ago by archana.bioinfo87180

However, it only looks for the last line of the List_of_ids file in the all_data file and not the whole list...

ADD REPLYlink written 8 months ago by amandinelecerfdefer20

No, it shouldn't do that, it should work. Did you make the List_of_ids on Windows? If so, use dos2unix on it, as the line endings might be different.

ADD REPLYlink written 8 months ago by WouterDeCoster43k

Thank you! This is exactly the solution to my problem: an encoding problem!

ADD REPLYlink written 8 months ago by amandinelecerfdefer20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 878 users visited in the last hour