Where can I get fastq sequence files of length 150 and more.
2
0
Entering edit mode
3.9 years ago
AHW ▴ 50

I want to perform alignment of the human sequence reads with the reference genome. I need reads length 150 and more (500) to test some algorithm. Where can I find such type of reads, both single and paired-end. I got reads from 1000 genome project around 100 length, however I want reads of length more.

alignment sequence • 1.2k views
3
Entering edit mode
3.9 years ago
Gungor Budak ▴ 250

Use Sequence Read Archive advanced search and provide read length as 150 and species as Homo sapiens. You can add more filters if you want. The query should look something like this:

(150[ReadLength]) AND "Homo sapiens"[orgn:__txid9606]

0
Entering edit mode

Agaz,

If transcriptome reads generated from Pacbio interests you then you can access European Nucleotide Archive accession PRJEB3969 (https://www.ebi.ac.uk/ena/data/view/PRJEB3969)

1
Entering edit mode
3.9 years ago

This is a very basic requirement and a lot of tools are available to simulate artificial reads from the genome under question. More interestingly, you can define the number, length and quality of reads also. One such well documented program ( ArtificialFastqGenerator )is here

Another very sophisticated tool is ART (courtesy: Biostars Handbook) which can mimic the sequencing platforms very well.