Hi, I'm currently writing a script in python that will read in both FASTA and BAM files and output certain information.
Right now I'm in need of a BAM file and matching FASTA file to play around with. As most BAM files are huge, for eg.the human genome, I though it would be good to play around with a yeast strain.
I have the necessary FASTA files for the chromosomes of S. Cerevisiae: http://hgdownload.cse.ucsc.edu/goldenPath/sacCer2/chromosomes/
But now, I need a matching BAM file as well.
I've been googling around without any luck. So does anyone know where I can find a good BAM file to play around with for S. Cerevisae?
Why not just download a random yeast dataset from ENA and align it? You'll then have your BAM file.
I'm kinda new to sequencing, didn't knowyou could do that. Do you mean that I can download a dataaset in fastq and align it using a tool such as samtools?
Is this the ENA you're referring to? http://www.ebi.ac.uk/ena
Yes, though you'll need to use an aligner, such as bowtie2/bwa/bbmap/etc.
Yes, that's the ENA that I mean.
Okay, cool. So I found this: http://www.ebi.ac.uk/ena/data/view/AEWK01000001-AEWK01004118
Is it the contigs I need to download? I'm a little confused, because they come in fasta format and note fastq?
You get the reference from the location above.
Then get sequence data from ENA. An example dataset (get the fastq files). Then use fastq files to align them against the reference (your will need to create reference genome index) to produce a bam file using one of the aligners mentioned by @Devon above.
Thanks I'll try it out and see how it goes!