Entering edit mode
                    9.0 years ago
        Jautis
        
    
        ▴
    
    580
    Hi, I have a vcf file and I would like to get a site-by-individual matrix of read depths (the DP label) and a second matrix of just the GQ scores.
What is the easiest way to do this? Thanks in advance!
Ex input:
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  samp1 samp2
chr1   100  .       C       T       3106.72 SnpCluster      .       GT:AD:DP:GQ:PL  0/0:1,0:1:3:0,3,42      0/0:3,0:3:9:0,9,132
chr1   120  .       C       G       3106.72 SnpCluster      .       GT:AD:DP:GQ:PL 0/1:3,1:4:30:30,0,123   1/1:0,1:1:3:45,3,0
Ex output for DP:
1    3
4    3
If you need the stats for just one sample (column),
grep -v '#' test.vcf | cut -f10 | awk -F ':' '{print $3"\t"$4}'should do. For statistics over multiple samples, I would write a script to parse out the details, which should be pretty straightforward.Hi,I want to know what the "snpcluster" displayed in the "info" column of your vcf file means
Weird that you would necropost a 6-year old topic for this; but SnpCluster is a default filter in FreeBayes that filters out variants within a certain distance of one another. Typically a mis-modeled indel will show up as multiple mismatches within the same window. This is largely obviated by more modern local-assembly approaches and local realignment, as well as rank-sum annotations for mapping quality or strand direction, which also tend to correlated with "clustered" variants.
Thank you for your reply! I am doing RNA-Seq-related research recently. Do you think Freebayes can use transcriptome data for SNV-calling?
It can - I refer you to https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1863-4 for a detailed discussion.
I see,Thank you so much!