Question: bcftools roh: short way to put several samples from a merged VCF in the same column?
0
gravatar for miqrom
9 months ago by
miqrom0
miqrom0 wrote:

First, thanks to Kevin for his answer to my last question (I can't answer my post, I tried 3 times) about grouping columns from different samples (same individual) in a merged VCF.

The only command that put all positions in same column is bcftools roh although output is a plain text file with 2 columns to remove, first ST and second sampleName.

I think it can be converted to VCF with bcftools convert --tsv2vcf

EDIT (@RamRS): Added context.

ADD COMMENTlink modified 9 months ago • written 9 months ago by miqrom0

Can you paste an example of your input and what you would like your output to be?

ADD REPLYlink written 9 months ago by Kevin Blighe42k

bcftools concat doesn't work with samples aligned to different fasta (I indexed every GRCh37 chromosome fasta with bwa index and samtools faidx. I don't know if GATK CatVariants could run in my PC (3 Gb RAM). I have all my snps-indels variants by chromosome with bcftools view -v snps,indels -o output.vcf input.bcf and then I get GTs with bcftools call -m --ploidy GRCh37 -Oz -o outputGT.vcf.gz input.vcf.

When Kevin talked me about bcftools norm I was frustrated because of I haven't found any option to place all loci in same column. Then I checked bcftools roh merged.vcf and I get almost all I want: all positions in same column (I have to remove 2,000 regions or RT). Python have commands to remove columns I don't need (samples, sites-regions) and I can merge GT info after converting text file to VCF with bcftools isec -p mergedROH roh.vcf.gz outputGT.vcf.gz

ADD REPLYlink modified 9 months ago by RamRS21k • written 9 months ago by miqrom0

Do you understand what each of those bcftools sub-programs do? roh is not a formatter. What norm does can be called formatting if you stretch the meaning of the word, but roh does not merge anything. It detects runs of homozygosity. You might see output in a format you wish to see but that does not mean you're getting an accurate result. I don't see how your use of bcftools isec is valid either.

ADD REPLYlink written 9 months ago by RamRS21k

Also, when you say

bcftools concat doesn't work with samples aligned to different fasta

I don't see that in the manual. Can you show me an error message? And please use the code formatting option to format your posts better.

code_formatting

ADD REPLYlink written 9 months ago by RamRS21k
2

miqrom, you have never definitively explained what you want. Some test input and expected output would be a great help for both Ram and I. We are volunteers and are aiming to help you on our own free time.

ADD REPLYlink written 9 months ago by Kevin Blighe42k

bcftools concat always shows: Different sample names in input.chr2.vcf.gz. Perhaps "bcftools merge" is what you are looking for? I checked "bcftools norm" with my merged vcf (bcftools merge). After I run again "bcftools roh merged.norm.vcf" and all regions have desapeared (I have 113,888,324 ST lines in 22 merged autosomal chromosomes). I know that runs of autozigoticy is not a right option to make a GTs VCF although I can get an intersection after converting to VCF: all variants shared with merged.VCF (bcftool isec) can be easely obtain their GT in a new column.

ADD REPLYlink written 9 months ago by miqrom0

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

If you are posting this from China and are not able to use the ADD COMMENT/ADD REPLY buttons (ones with gray background highlight) then switch to chrome browser. People have said that works in the past.

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax67k

Randomly throwing bcftools sub-programs will not get you anywhere. Define exactly what you need, and we can help you find the tool for the task. To define what you need, you have to give us an overview of the salient points in your workflow, the exact step where you're facing problems in the workflow, sample input (that included a few entries where you face a problem) and expected output. Actual output that you're getting at the moment would also help. Please give us these instead of telling us how you threw every bcftools sub-program at your files.

ADD REPLYlink written 9 months ago by RamRS21k
1

I think that you need to do the following (assuming that you have separate VCF files):

  1. rename the sample ID in each file to have the same ID (bcftools reheader)
  2. concatenate all and remove duplicates (essentially a UNION of all variants in all files) (bcftools concat VCF1.vcf.gz VCF2.vcf.gz VCF3.vcf.gz ... --allow-overlaps --remove-duplicates)
ADD REPLYlink modified 9 months ago • written 9 months ago by Kevin Blighe42k

Thanks, Kevin, though many problems to do it. First, I tried with a vcf.gz imput like console's exemple says but it didn't run right. Second, bcftools -h <file> is not an accepted command ("can't read any -h file to substitute header). Finally, I run bcftools -s <sample file> successfully. My sample file was only a text line with first sample name: chr1.sorted.bam I change every chr.vcf and run bcftools concat with uncompressed vcf inputs and -Oz -o output.vcf.gz Thanks again

ADD REPLYlink modified 8 months ago by RamRS21k • written 8 months ago by miqrom0

Once again, use the code formatting option to format your posts. See the changes I made to your comment above to know how you can format your posts better.

ADD REPLYlink written 8 months ago by RamRS21k

I mean this in the best possible way, but do you have any idea what you're trying to do?

I tried with a vcf.gz imput like console's exemple says but it didn't run right.

This sentence makes no sense. What does "console's example" mean? What does "it didn't run right" mean?

Second, bcftools -h <file> is not an accepted command ("can't read any -h file to substitute header)

Where did you get the idea to run bcftools -h <file>? Did anyone suggest it?

Finally, I run bcftools -s <sample file=""> successfully. My sample file was only a text line with first sample name: chr1.sorted.bam I change every chr.vcf and run bcftools concat with uncompressed vcf inputs and -Oz -o output.vcf.gz

Again, what are you doing? Are you just typing random characters with bcftools as the first word? I hope you understand how command line programs work.

ADD REPLYlink modified 8 months ago • written 8 months ago by RamRS21k

@OP, just because a command completes without an error, doesnt mean it ran correctly or successfully...

ADD REPLYlink written 8 months ago by jrj.healey12k
2
gravatar for RamRS
9 months ago by
RamRS21k
Houston, TX
RamRS21k wrote:

bcftools roh detects runs of homozygosity. It is not a formatting tool. You might want to check out GATK CatVariants or bcftools concat

ADD COMMENTlink modified 9 months ago • written 9 months ago by RamRS21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1561 users visited in the last hour