junction VCFs containing SNPs
0
2
Entering edit mode
2.9 years ago
fcarolinebe ▴ 40

Hi,

I made a snp call from 12 rna-seqs using GATK and Freebayes. But I need to put the files together in one single file .vcf, keeping the coluns with CHROM, POS, ID, REF, ALT and all informations about filters of all samples.

Somebody can help me ? Please!!!

I used this script:

bcftools +missing2ref ffinal_tow_teste_vcf.gz | bcftools query -H  -f '%CHROM\t%POS\t%ID\t%REF\t%ALT[\t%TGT]\n' > final_all_2.vcf

but the result was printed with alt as ref and not samples with their own snp:

# [1]CHROM      [2]POS  [3]ID   [4]REF  [5]ALT  [6]20:GT        [7]2:20:GT
LG1     43262   .       G       T       G       G
LG1     45686   .       C       T       C       C
LG1     46582   .       G       C       G       G
LG1     47066   .       T       C       T       T
LG1     48178   .       A       G       A       A
LG1     48716   .       G       A       G       G
LG1     48916   .       C       T       C       C
LG1     49053   .       C       T       C       C
LG1     49499   .       G       C       G       G
LG1     49563   .       T       C       T       T
LG1     49596   .       G       C       G       G
LG1     49702   .       G       A       G       G
LG1     50029   .       T       C       T       T
cfconcat vcfquerry bcftools vcfmerge • 957 views
ADD COMMENT
1
Entering edit mode

Are you familiar with bcftools? Why are you showing us the output of bcftools query when you want VCF output?

ADD REPLY
1
Entering edit mode

I get it! But, my question is how can I join the files, keeping the column CHROM, POS, ID, REF, ALT, where SNPs for each sample appear corresponding to their sequence? Because when I put the alternative SNPs together they look like the reference SNP, as shown above.

ADD REPLY
1
Entering edit mode

I'm sorry, I'm unable to understand your problem statement. Can you explain a little more and maybe show a couple of examples of what you need?

ADD REPLY
1
Entering edit mode

OK,

initially I tried to join the VCFs files, but I got the files together as a result, only it being impossible to identify which sequence has the alternative SNP ... as in the example:

#CHROM | POS  |   ID  |    REF   |  ALT |    QUAL  |  FILTER | INFO |   FORMAT | 20  |    2:20
LG1 | 27789417  |  .  |  C |  A,T  |     69.7962 |. |DPB=6;EPPR=3.0103;GTI=0;MQMR=60;NS=1;NUMALT=1;ODDS=2.75466;PAIREDR=0;PQR=0;PRO=0;QR=72;RO=2;RPPR=3.0103;SRF=0;SRP=7.35324;SRR=2;DP=22;AB=0.$;AB=0.666667,0.375;ABP=4.45795,5.18177;AF=0.5,0.5;AO=4,6;CIGAR=1X,1X;DPRA=0,0;EPP=3.0103,4.45795;LEN=1,1;MEANALT=1,1;MQM=60,60;PAIRED=0,0;PAO=0,0;PQA=0,0;QA=140,194;RPL=2,4;RPP=3.0103,4.45795;RPR=2,2;RU$$2,2;RUN=1,1;SAF=0,0;SAP=11.6962,16.0391;SAR=4,6;TYPE=snp,snp;technology.illumina=1,1;AN=0;AC=0,0   |    GT:DP:RO:QR:AO:QA:GL |   .:.:.:.:.:.:.  | .:.:.:.:.:.:.

Ps: I know that you are very confused to understand the result placed in this way, but I am unable to place a screen print, to facilitate its understanding

Then I tried to use the QUERY tool to try to separate the SNPs by sample, and even though it appears that there is an alternative SNP different from the reference SNP, when I run the scipt all samples appear as if the SNP was identified to the reference.

ADD REPLY
0
Entering edit mode

We are having some major communication difficulties here. Screenshots of plain text are counterproductive, you're doing the right thing by pasting the content directly. See this post for plain text formatting tips: How to Use Biostars Part-3: Formatting Text and Using GitHub Gists

You should provide examples of your individual VCF entries, what you need as the final result (expected output), what you tried running and what you get (actual output). In my experience, when you prepare these details, you will often stumble upon the solution yourself. If that doesn't happen, it will at least make it extremely easy for others to help you.

ADD REPLY

Login before adding your answer.

Traffic: 1418 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6