Tassel4 Doesn'T Read My Flapjack-Files - Tassel3 Does, On The Other Hand. Anyone Else Ever Encounter This?
1
1
Entering edit mode
11.1 years ago

Good morning,

I'm currently trying to use TASSEL to generate a Linkage Disequilibrium plot using the ALL-function, but because I have a lot of SNPs I'd like to use the "retainRareAlleles - false" function which apparently only exists in TASSEL4. I'm using a genotype and a map-file generated by the Export function of Flapjack 1.13.03.19.

Here's my XML-file generated by TASSEL 4 (which, by the way, didn't include the runfork1/ part so nothing was ever run :( )

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<TasselPipeline>
    <fork1>
        <flapjack>
            <geno>a.genotype</geno>
            <map>a.map</map>
            <retainRareAlleles>false</retainRareAlleles>
        </flapjack>
        <ld>
            <ldType>All</ldType>
        </ld>
        <td_csv>ld_out.csv</td_csv>
        <ldd>svg
            <ldplotlabels>false</ldplotlabels>
            <o>ld_output.svg</o>
        </ldd>
    </fork1>
    <runfork1/>
</TasselPipeline>

generated by the command:

perl tassel4-standalone/run_pipeline.pl -createXML mycleanconf.xml -fork1 -flapjack -geno a.genotype -map a.map -retainRareAlleles false -ld -ldType All -td_csv ld_out.csv -ldd svg -ldplotlabels false -o ld_output.svg

and the XML is run using:

perl tassel4-standalone/run_pipeline.pl -configFile mycleanconf.xml

Now the problem: When I run the XML with TASSEL4, I get this:

net.maizegenetics.baseplugins.FlapjackLoadPlugin
   net.maizegenetics.baseplugins.LinkageDisequilibriumPlugin
      net.maizegenetics.baseplugins.TableDisplayPlugin
         net.maizegenetics.baseplugins.LinkageDiseqDisplayPlugin
[Thread-2] ERROR net.maizegenetics.baseplugins.FlapjackLoadPlugin - Flapjack files a.genotype and a.map failed to load. Make sure the import options are properly set.

The kicker is that TASSEL3 loads the files without complaining (after I removed the retainRareAlleles line) but has been running for about 5 days now without any result at all.

I know that using the ALL-function (comparing all SNPs against all) rightly takes forever (a sliding window of 50 just takes 1-2 hours) and isn't the best choice when it comes to my ~30,000 SNPs, but I'm still curious as to why TASSEL4 doesn't work here and whether anyone else has ever encountered this?

Edit: Or even better, does anyone know a faster alternative to do a full LD-analysis on such a large dataset? Or any alternative approaches? I'm a bit new to this LD-analysis-thing. Thanks!

gwas ld • 3.2k views
ADD COMMENT
0
Entering edit mode
10.6 years ago

Since this post seems to have come up a couple of times when I was googling for problems with TASSEL 3 input, I've decided to put my "tips" on how to get Flapjack-output into TASSEL 3 here. May some other poor soul one day stumble on this information and save some time. Tested with TASSEL Version 3.0 (Build: September 5, 2013) and Flapjack 1.13.03.19.

  1. TASSEL 3 wants "-" for missing alleles, not "" like Flapjack exports. Use for example Python's split("\t") to get a list of the line and then replace all "" by "-".
  2. TASSEL 3 wants all lines to have the same amount of elements, so use for example Python's split("\t") to see whether you have a line too long (Flapjack seems to ignore this)
  3. In the .map file, commas are not allowed (even though Flapjack exports the positions with commas in them). Use vim's "%s/,//g" or similar to remove them.
  4. The first line in the genotype (or dat) file is not allowed to be '# fjFile = GENOTYPE', delete it so that the first line is the list of marker names.

That is all I have so far. It has not escaped my notice that there's probably more! Enjoy.

ADD COMMENT

Login before adding your answer.

Traffic: 3332 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6