Obtaining required rows from HG19 file with R or Linux
0
0
Entering edit mode
3.0 years ago
salman_96 ▴ 70

Hi I have hg19 snps file which has some extra rows that I do not need and looks like this below

##INFO=<ID=COMMON,Number=1,Type=Integer,Description="RS is a common SNP.  A common SNP is one that has at least one 1000Genomes population with a minor allele of frequency >= 1% and for which 2 or more >
##INFO=<ID=TOPMED,Number=.,Type=String,Description="An ordered, comma delimited list of allele frequencies based on TOPMed, starting with the reference allele followed by alternate alleles as ordered in>
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       10019   rs775809821     TA      T       .       .       RS=775809821;RSPOS=10020;dbSNPBuildID=144;SSR=0;SAO=0;VP=0x050000020005000002000200;GENEINFO=DDX11L1:100287102;WGT=1;VC=DIV;R5;ASP
1       10039   rs978760828     A       C       .       .       RS=978760828;RSPOS=10039;dbSNPBuildID=150;SSR=0;SAO=0;VP=0x050000020005000002000100;GENEINFO=DDX11L1:100287102;WGT=1;VC=SNV;R5;ASP
1       10043   rs1008829651    T       A       .       .       RS=1008829651;RSPOS=10043;dbSNPBuildID=150;SSR=0;SAO=0;VP=0x050000020005000002000100;GENEINFO=DDX11L1:100287102;WGT=1;VC=SNV;R5;ASP
1       10051   rs1052373574    A       G       .       .       RS=1052373574;RSPOS=10051;dbSNPBuildID=150;SSR=0;SAO=0;VP=0x050000020005000002000100;GENEINFO=DDX11L1:100287102;WGT=1;VC=SNV;R5;ASP
1       10055   rs892501864     T       A       .       .       RS=892501864;RSPOS=10055;dbSNPBuildID=150;SSR=0;SAO=0;VP=0x050000020005000002000100;GENEINFO=DDX11L1:100287102;WGT=1;VC=SNV;R5;ASP

I only want to keep anything from this row using either R or Linux

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
hg19 • 847 views
ADD COMMENT
0
Entering edit mode

What have you tried? The logic you need is to exclude all lines that begin with ##. grep should help you achieve this. Use google to find out how to exclude lines that start with a pattern using grep.

ADD REPLY
0
Entering edit mode

I used sed to remove first 55 rows

sed -i 1,55d hg19-SNPs-annotation.txt
ADD REPLY
0
Entering edit mode

That approach has many pitfalls:

  1. You're editing your file in-place. If there's even a slightest typo in your command (which happens to everyone a lot of the time), your input file is now permanently altered and you cannot go back to the original. Unless you're 100% sure a command works exactly as you want it to, do not use in-place editing.
  2. You had to manually count the number of lines to delete. This number will not be consistent the next time you need this operation.
  3. Your command is not self-documenting. Reading your command, I can say you deleted the first 55 lines, but not why you deleted them, or what context there is to these 55 lines. However, a grep would tell you what content you deleted, and given that number of lines is not important as long as the nature of the content is known, you should focus on documenting that.
ADD REPLY

Login before adding your answer.

Traffic: 1619 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6