Question: Grep Position List
0
gravatar for BAGeno
18 months ago by
BAGeno90
BAGeno90 wrote:

Hi,

I have VCF file and list of position file. I want to grep those records from VCF which have position corresponding to position present in position file. For this purpose I used this command.

grep -wfF Position.txt myVCF > insec.vcf

here is the list of position.

`POS
10248
10321
10327
10492
10583
12783
13116
13118
13273
13302
14354
14542
14907
14930
15029
15118
15190
15208
15211
15274
`

Here is the result

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  LP6005441-DNA_G11
chr1    10492   rs55998931      C       T       182.77  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=2.356;ClippingRankSum=0;DB;DP=4
chr1    10583   rs58108140      G       A       170.77  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=-3.006;ClippingRankSum=0;DB;DP=
chr1    12783   rs62635284      G       A       522.77  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=4.292;ClippingRankSum=0;DB;DP=2
chr1    13116   rs62635286      T       G       125.77  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=-0.301;ClippingRankSum=0;DB;DP=
chr1    13118   rs62028691      A       G       125.77  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=-1.009;ClippingRankSum=0;DB;DP=
chr1    13273   rs531730856     G       C       675.77  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=-2.577;ClippingRankSum=0;DB;DP=
chr1    13302   rs180734498     C       T       135.77  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=1.88;ClippingRankSum=0;DB;DP=41
chr1    13417   rs777038595     C       CGAGA   606.73  PASS   AC=1;AF=0.5;AN=2;BaseQRankSum=-2.053;ClippingRankSum=0;DB;DP=
chr1    13896   rs201696125     C       A       195.77  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=-1.19;ClippingRankSum=0;DB;DP=5
chr1    14699   rs62635298      C       G       160.77  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=1.278;ClippingRankSum=0;DB;DP=3
chr1    14907   rs6682375       A       G       922.77  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=-0.114;ClippingRankSum=0;DB;DP=

But it is giving some extra records as well like record containing 13415,13896. These positions are not present in list of positions. What shall I do to overcome this problem?

grep vcf • 1.1k views
ADD COMMENTlink modified 18 months ago by Maxime Lamontagne2.1k • written 18 months ago by BAGeno90

Hi,

Use same command (without -w) by including <tab>YOUR_POSITION<tab> in 'Position.txt' file. This might work.

Example to add tabs:

sed -i 's/^/\t/' Position.txt

sed -i 's/$/\t/' Position.txt
ADD REPLYlink modified 18 months ago • written 18 months ago by EagleEye5.9k

It worked :) Thank you

ADD REPLYlink written 18 months ago by BAGeno90
0
gravatar for Istvan Albert
18 months ago by
Istvan Albert ♦♦ 78k
University Park, USA
Istvan Albert ♦♦ 78k wrote:

Grep is not all that well suited for filtering VCF files, or in general, column-oriented files where you'd want to apply the filter on just one column. Perhaps you could extend the pattern to include the leading and trailing tabs.

See bcftools for a whole range of options for filtering that initially may look a little more complicated yet will serve you well as soon as you want to apply more diverse filters:

https://samtools.github.io/bcftools/bcftools.html

ADD COMMENTlink written 18 months ago by Istvan Albert ♦♦ 78k

I had also used bcftools isec but it gave empty intersection files. I want to compare two VCF. These files my have same position but different rsIDs on that position.

ADD REPLYlink written 18 months ago by BAGeno90

Have you tried vcf-compare from vcftools?

ADD REPLYlink written 18 months ago by dschika290

On the theme of filtering column-oriented files, if you are prepared to code stuff up yourself, consider awk within the shell, or modules like csv within python etc.

ADD REPLYlink written 18 months ago by jrj.healey8.7k
0
gravatar for dschika
18 months ago by
dschika290
European Union
dschika290 wrote:

What about vcftools:

--positions

Include or exclude a set of sites on the basis of a list of positions in a file. Each line of the input file should contain a (tab-separated) chromosome and position. The file can have comment lines that start with a "#", they will be ignored.

ADD COMMENTlink written 18 months ago by dschika290

I tried this command but it did not work.

 vcftools --vcf myNew.vcf --positions Chr.pos.tsv --out Filtered_File
ADD REPLYlink written 18 months ago by BAGeno90

Try adding:

--recode
ADD REPLYlink written 18 months ago by dschika290

This is giving VCF file with just header.

ADD REPLYlink written 18 months ago by BAGeno90

Hm...could you please double check:

How does Chr.pos.tsv look like? Have you included chromosomes? Are chromosome and position tab-separated?

How does myNew.vcf look like? Is there more than just the header-section? Are there chromosome-position pairs matching your Chr.pos.tsv file?

ADD REPLYlink written 18 months ago by dschika290

This is what myNew.vcf look like.

##bcftools_filterCommand=filter -i DP>=20 myNew.raw30.vcf
##bcftools_filterCommand=filter -i GQ>=30 New.DP20.vcf
##liftOverProgram=CrossMap(https://sourceforge.net/projects/crossmap/)
##liftOverTime=April01,2017
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  LP6005441-DNA_G11
chr1    10492   rs55998931      C       T       182.77  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=2.356;ClippingRankSum=0;DB;DP=42;ExcessHet=3.0103;FS=0;MLEAC=1;
chr1    10583   rs58108140      G       A       170.77  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=-3.006;ClippingRankSum=0;DB;DP=37;ExcessHet=3.0103;FS=0;MLEAC=1
chr1    12783   rs62635284      G       A       522.77  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=4.292;ClippingRankSum=0;DB;DP=29;ExcessHet=3.0103;FS=0;MLEAC=1;
chr1    13116   rs62635286      T       G       125.77  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=-0.301;ClippingRankSum=0;DB;DP=28;ExcessHet=3.0103;FS=8.671;MLE
chr1    13118   rs62028691      A       G       125.77  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=-1.009;ClippingRankSum=0;DB;DP=27;ExcessHet=3.0103;FS=5.521;MLE
chr1    13273   rs531730856     G       C       675.77  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=-2.577;ClippingRankSum=0;DB;DP=46;ExcessHet=3.0103;FS=6.947;MLE

This is what Chr.pos.tsv look like.

##reference=file:///groups/reich/reference-genomes/hs37d5/unzipped/hs37d5.fa
##bcftools_viewVersion=1.2+htslib-1.2.1
##bcftools_viewCommand=view -h /n/data1/hms/genetics/reich/1000Genomes/cteam_remap/A-samples/LP6005441-DNA_G11/annotvcf/LP6005441-DNA_G11.annotated.vcf.bgz
##liftOverProgram=CrossMap(https://sourceforge.net/projects/crossmap/)
##liftOverFile=/data1/Sarah/database/hg19ToHg38.over.chain.gz
##new_reference_genome=/data1/Sarah/database/new_hg38.fa
##liftOverTime=May22,2017
#CHROM  POS
1       10248
1       10321
1       10327
1       10492
1       10583
1       12783
1       13116
1       13118
1       13273
ADD REPLYlink written 18 months ago by BAGeno90

Well, "chr1" is not equal "1" ...

ADD REPLYlink written 18 months ago by dschika290
0
gravatar for Maxime Lamontagne
18 months ago by
Québec
Maxime Lamontagne2.1k wrote:

Why not use R with the merge command?

data1 <- read.table("VCF File", header=F, sep="\t")

data2 <- read.table("Position File", header=F, sep="\t") 
merge.results <- merge(data2, data1, by.x="V1", by.y="V2", all=F)

write.table(merge.results, "VCF Subset", col.names=F, row.names=F, quote=F, sep="\t")
ADD COMMENTlink modified 18 months ago by WouterDeCoster34k • written 18 months ago by Maxime Lamontagne2.1k

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLYlink written 18 months ago by WouterDeCoster34k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1640 users visited in the last hour