Question: How to obtain chromosome number from given scaffold number
0
gravatar for Parimala Devi
3.5 years ago by
India/Bangalore/Jubilant Biosys
Parimala Devi60 wrote:

Hi,

I am working on Saccharomyces cerevisiae, Y55 strain.  I obtained my reference sequence from here.  And this is how the Y55_Stanford_2014_JRIF00000000.fsa sequence looks.  It doesn't include the chromosome number. 
Reference

>gi|696435221|gb|JRIF01000001.1| Saccharomyces cerevisiae Y55 scaffold-0, whole genome shotgun sequence [length=107844]
TTAAGCCTTCAAAGAAGAAGCTCTTCTCTTTCTGATTTCGGCCTTTTCAGCCTTTCTTTCAGACAATCTCTTAGCCAACA
ATTGAGCGTATTCGGCAGCAGCTTCTCTTTGAGCTTGAGCGTTTCTGACCTTCAAAGCTCTTTGGTGTCTCTTTCTTTGC


There's also a gatk-snv vcf included which has the chromosome number. 
gatk.vcf

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    Y55
chrI    111    .    C    T    155.12    PASS    AC=2;AF=1.00;AN=2;BaseQRankSum=3.331;DP=28;Dels=0.00;FS=0.000;HaplotypeScore=12.7288;MLEAC=2;MLEAF=1.00;MQ=28.02;MQ0=0;MQRankSum=0.666;QD=5.54;ReadPosRankSum=2.596;SB=-1.912e+01    GT:AD:DP:GQ:PL    1/1:14,14:28:5:186,5,0
chrI    136    .    G    A    336    PASS    AC=2;AF=1.00;AN=2;BaseQRankSum=1.625;DP=32;Dels=0.00;FS=7.270;HaplotypeScore=30.3526;MLEAC=2;MLEAF=1.00;MQ=28.13;MQ0=0;MQRankSum=-1.733;QD=10.50;ReadPosRankSum=1.083;SB=-9.901e+01    GT:AD:DP:GQ:PL    1/1:1,31:32:45:369,45,0
chrI    156    .    C    G    18.07    LowQual;SnpCluster;filter    AC=1;AF=0.500;AN=2;BaseQRankSum=1.146;DP=44;Dels=0.00;FS=7.776;HaplotypeScore=20.3077;MLEAC=1;MLEAF=0.500;MQ=30.62;MQ0=0;MQRankSum=2.856;QD=0.41;ReadPosRankSum=0.838;SB=-6.519e-03    GT:AD:DP:GQ:PL    0/1:38,6:44:48:48,0,469


Is there anyway I can obtain the chromosome numbers to call for variants for my own data? I want to analyse SNPs and INDELs in each chromosome. 

Thank you,

Parimala 

ADD COMMENTlink modified 3.5 years ago by Steven Lakin1.4k • written 3.5 years ago by Parimala Devi60
1
gravatar for Steven Lakin
3.5 years ago by
Steven Lakin1.4k
Fort Collins, CO, USA
Steven Lakin1.4k wrote:

All of the information about the reference file is stored in the .gff

http://downloads.yeastgenome.org/sequence/strains/Y55/Y55_Stanford_2014_JRIF00000000/Y55_JRIF00000000.gff.gz

You'll have to find a way to parse it to your liking, but here is quick example bash shell command to print only the reference information followed by its chromosome:

cat Y55_JRIF00000000.gff | sed 's/,/\t/g' | awk '{print $1,$11}' | grep "^g" > RefChromosomes.txt

 

Take a look at the format of the .gff file to parse it in a different way; you can then link the reference entries to their respective chromosomes in your own code (e.g. add the chromosome as another field in the reference line), split the fasta file by chromosome and call each individually, etc.  I'm not sure if there is a way to incorporate gff directly into the variant calling pipeline, but there may be.

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Steven Lakin1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1732 users visited in the last hour