Chromosome Specific Fasta file(s)
1
0
Entering edit mode
7.4 years ago
karl.sebby ▴ 100

Does anyone know of any handy smaller datasets for just playing with mapping tools? I would like to reduce the file sizes so I can just run things on my laptop before running bigger jobs. I was thinking of just truncating the gencode GTF file and hg38 reference fasta to a single chromosome (say, chromosome 21 or 22). Then, what I'd be lacking is a an experimental fasta (or pair of fasta for paired-end) that only contains reads from a single chromosome. Is there someplace to find targeted experimental reads like this? I could make some synthetic ones, but I like the idea of using experimental data better. Thanks.

Assembly genome • 1.3k views
ADD COMMENT
0
Entering edit mode

A handy small reference would be the human mitochondrial genome, 17kb roughly. It is included in the hg19 assembly and can be downloaded from UCSC. If you search around a bit, finding some mitochondrial sequencing should not be a problem. Alternatively, use the E.coli genome with its 5Mb. Finding E.coli NGS data should be even easier than chrM-seqs.

ADD REPLY
0
Entering edit mode

chrM will not be a good test case for many things since it's so different from nuclear DNA in composition, repetitive content, heteroplasmy (not diploid!),...

ADD REPLY
1
Entering edit mode
7.4 years ago

Don't you mean fastq for read files? Anyway, I would take a WES or WGS dataset, map it to the genome and then extract those reads mapping to chr22.

ADD COMMENT

Login before adding your answer.

Traffic: 2000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6