Entering edit mode
                    8.5 years ago
        Pedro Morell
        
    
        ▴
    
    10
    Hi, I'm trying to reduce a vcf file from 1000 Genomes to just a set of SNPs (around 40k). I have the list of dbSNPs IDs stored on a pandas Series, and I'm trying to retrieve just those genes with a "if record in series:" , but it's not working. Any suggestion on how to call for just certaing SNPs?
Thank you.
Thanks! I'm trying to use this method, but I'm having an Error with the input. My file looks like this:
So, according to the description (plain text, a single ID per row) it should be working. Any advices?
what is the error ?
ERROR MESSAGE: Invalid argument value 'HCL.txt' at position 4.
Also tryed with a toy list copied from the IDs in the vcf and got the same message, so the format is the issue.
can you send the full cmd line please.
first, GATK cannot work without the option
-R ref.fasecond, this is not the cmd line you used as there is no such
argument value 'HCL.txt' at position 4.I actually copy/pasted it, but decided to change the names in favor of clarity. I tryed with the reference genome, and now I'm getting this error instead:
This is the cmd line I used, this time with its original names:
http://gatkforums.broadinstitute.org/gatk/discussion/2396/input-files-known-and-reference-have-incompatible-contigs
check #chrom column in your VCF and match with that from fasta file headers (chromosome/contig names)
I done this, but still have the same problem. I've realized that, even if I changed my chrom column, the contigs in the header are still labeled as 1,2,3,4. How can I change them using gawk?