Ldhat input format
1
0
Entering edit mode
8.8 years ago

Dear all,

I have some trouble in the data format in running Ldhat program. I'd like to generate sites and locs flles for full sequence data. But I don't understand how to make a locs file. The example lpl_fn.locs is as follows,

61 9.73 L
0.106
0.11
0.145
0.325
0.479
0.736
1.216
1.22
1.286
1.547
1.571
1.828
1.939
2.131
2.5
2.619
2.987
2.996
3.022
3.248
3.609
3.723
3.843
4.016
4.343
4.346
4.418
4.426
4.509
4.576
4.872
4.935
5.085
5.168
5.441
5.554
5.56
5.687
6.25
6.595
6.678
6.718
6.772
6.863
7.315
7.344
7.36
7.413
7.754
8.089
8.285
8.292
8.393
8.533
8.537
8.644
8.755
8.852
9.402
9.712
9.721

61 is the number of sites, L details a model, What 9.73 stand for?

Could you please explain what this column stand for?

Thank you very much!

Best regards,
Yuan

software-error • 1.9k views
ADD COMMENT
0
Entering edit mode
8.8 years ago
jsgounot ▴ 170

9.73 is the total length of the region analyzed. You can find more information about ldhat input files in the manual (p.5).

ADD COMMENT
0
Entering edit mode

Thanks.

The sites file is as follows,

>Seq1
CAGTTCCTCAGCACGATCGCTCGCAGCTCAATGTTAATTGTAACGAGTCGCATAATATAGG
>Seq2
CAGTTCCTCAGCACGATCGCTTGCAGCTCAATGTTAGTTGTAACGAGTCGCATAACATAGG
>Seq3
CAGTTCCTCAGCACGATCGCTTGCACCTCAATGTTAGTTGTAACGAGTCGCATAATATAGG
>Seq4
CAGTTCCTCAGCACGATCGCTTGCACCTCAATGTTAGTTGTAACGAGTCGCATAATATAGG
>Seq5
CAGTTCCTCAGCACGATCGCTTGCACCTCAATGTTAGTTGTAACGAGTCGCATAATATAGG
>Seq6
CAGTTCCTCAGCACGATCGCTTGCACCTCAATGCTAGTTGTAACGAGTCGCATAATATAGG
>Seq7
CAGTTCCTCAGCACGATCGCTTGCACCTCAATGCTAGTTGTAACGAGTCGCATAATATAGG
>Seq8
CAGTTCCTCAGCACGATCGCTTGCTCCTCAATGTTAGTTGTAACGAGTCGCATAATATAGG
>Seq9
CAGTTCCTCAGCACGATCGCTTGCACCTCAATGTTAGTTGTAACGAGTCGCATAATATAGG
>Seq10
CAGTTCCTCAGCACGATCGCTTGCACCTCAATGTTAGTTGTAACGAGTCGCATAATATAGG
>Seq11
CAGTTCCTCAGCACGATCGCTTGCACCTCAATGTTAGTTGTAACGAGTCGCATAATATAGG
>Seq12
CAGTTCCTCAGCACGATCGCTTGCACCTCAATGTTAGTTGTAACGAGTCGCATAATATAGG
>Seq13
CAGTTCCTCAGCACGATCGCTTGCACCTCAATGTTAGTTGTAACGAGTCGCATAATATAGG
>Seq14
CAGTTCCTCAGCACGATCGCTTGCACCTCAATGTTAGTTGTAACGAGTCGCATAATATAGG
>Seq15
CAGTTCCTCAGCACGATCGCTTGCACCTCAATGTTAGTTGTAACGAGTCGCATAATATAGG
>Seq16
CAGTTCCTCAGCACGATCGCTTGCACCTCAATGTTAGTTGTAACGAGTCGCATAATATAGG
>Seq17
CAGTTCCTCAGCACGATCGCTTGCACCTCAATGTTAGTTCTAACGAGTCGC?TAATATAGG
>Seq18
CAGTTCCTCAGCACGATCGCTTGCACCTCAATGTTAGTTCTAACGAGTCGCCTAATATAGG
>Seq19
CAGTTCCTCAGCACGATCGCTCGCACCTCAATGTTAATTGTAACGAGTCGCATAATATAGG
>Seq20
CAGTTCCTCAGCACGATCGCTTGCTCTTTAATGTTAGTTGTAACGAGTCGCCTAATATAGG
>Seq21
CAGTTCCTCAGCACGATCGCTTGCTCTTTAATGTTAGTTGTAACGAGTCGCCTAATATAGG
>Seq22
CAGTTCCTCAGCACGATCGCTTGCTCTTTAATGTTAGTTGTAACGAGTCGCCTAATATAGG
>Seq23
CAGTTCCTCAGCACGATCGCTTGCTCTTTAATGTTAGTTGTAACGAGTCGCCTAATATAGG
>Seq24
CAGTTTCTCACCACGATAGCTTGCACCTCAATGTTAGTTGTAACGAGTCGCATAATATAGG
>Seq25
CAGTTTCTCACCACGATAGCTTGCACCTCAATGTTAGTTGTAACGAGTCGCATAATATAGG
>Seq26
CAGTTTCTCACCACGATAGCTTGCACCTCAATGTTAGTTGTAACGAGTCGCATAATATAGG
>Seq27
CAGTTTCTCACCACGATAGCTTGCACCTCAATGTTAGTTGTAACGAGTCGCATAATATAGG
>Seq28
CAATTTCTCACCACAATCGCTTGCACCTCAATGCTAGTTGTAACGAGTCGCATAATATAGG
>Seq29
CAGTTTCTCACCACGATAGCTCACTCCTTAATGTTAGTTGTAACGAGTCGC?TAATATAGG
>Seq30
CAGTTTCTCACCACGATCGCTCACTCTTTAATGTTAGTTGTAACGAGTCGCCTAATATAGG
>Seq31
CAGTTTCTCACCACGATCGCTCACTCTTTAATGTTAGTTGTAACGAGTCGCCTAATATAGG
>Seq32
CAGTTTCTCACCACGATAGCTCACTGTTCAATGTTAGTTGTAACGAGTCGCCTAATATAGG
>Seq33
CAGTTTCTCACCACGATAGCTCACTGCTTAATGCTAGTTGTAACGAGCCGC?TAATATAGG
>Seq34
CAGTTTCTCACCACGATAGCTCACTCTTTAATGTTAGTTGTAACGAGTCGCCTAATATAGG
>Seq35
CAGTTTCTCACCACGATAGCTCACTCTTTAATGTTAGTCGTAACGAGTCGCCTAATATAGG
>Seq36
CAGTTTCTCACCACGATAGCTCACACCTTTGCGCTCGCCGGGACGAGTCTCAGCACGGAGG
>Seq37
CAGTTTCTCACCACGATAGCTCACACCTTTGCGCTCGCCGGGACGAGTCTCCGCACGGAGG
>Seq38
CAGTTCCTCAGCACGATCGCTTGCACCTTTGCGCTCGCCGGGACGAATCTCAGCACGGAAA
>Seq39
AAGTTTGCTACCATAAGCACTTGCACCTTTGCGCTCGCCGGGACGAGTCTCCGCACGGAGG
>Seq40
CAGCTTGCTACCATAAGCACTTGCACCTTAGCGCTCGCCGGGACGAGTCTCAGCACGGAGG
>Seq41
AAGTTTGCTACCATAAGCACTTGCTCCTTAGCGCCCGCCCGGATCAATAGCAGCATGGAAA
>Seq42
CAATTTCTCACCACGATCGCTTACAGCTTAGCGCTCGCTCGGATCAGTAGCAGCATGGAAA
>Seq43
CAATTTCTCACCACGATCGCTTGCAGCTTAGCGCTAG?TCGGATCAGTAGCAG??TGGAAA
>Seq44
CAGTTCCTCAGCACGATCGCTTGCAGCTCAG?GCCCGCTCGGATCAGTAGCAGCATGGAAA
>Seq45
CAGTTCCTCAGCACAATCGCTCGCACCTCAGCGC?CACTCGGATCAGTAGCAGCATGGAAA
>Seq46
CAGTTCCTCAGCACGATCGCTTGCACCTCAGCGCCCGCCCGGATCAATAGCAGCATGGAAA
>Seq47
CAGTTC??CAGCACGATCGCTTGCTCCTCAGCGCCCGCC?GGATCAATAGCAGCATGGAAA
>Seq48
CAGTTCCTCAGCACGATCGCTTGCTCCTTAGCGCCCGCTCGGATCAGTAGCAGCCTGGAAA

How can I know the total length of the region analyzed? I thought the total length is the length of sequence 61bp. So, I still confused what the followed number for every sits stand for. Thank you very much.

ADD REPLY
0
Entering edit mode

You seem to misunderstand what sites are. Sites are positions in your alignment where you can find SNPs, so in your example you don't have 61 sites, but 46 (I compute that quickly, can be false). For example, there are no SNPs in 2nd position of your alignment (all bases are A) so there is no need to put this position in your input files. However, to keep the information in mind, the total length of your alignment (61) must be wrote, and it's what the total length of the analyzed regions is used for. Moreover, you can write the sites position in bp, or in kb.

ADD REPLY
0
Entering edit mode

Thank you so much! I did misunderstand what sites are. I totally got it now. I am going to make sites file by filtering the homozygote position. Thanks for your patience!

ADD REPLY

Login before adding your answer.

Traffic: 2687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6