Question: Working with genomes in FASTQ format from ENA
gravatar for viscardi.neander
3.7 years ago by
United States
viscardi.neander10 wrote:


I',m working with the wolf genome and want to retrieve a specific region. The ENA only provides me the FASTQ file If i had the SAM file i could simply use samtools to retrieve the specific region given a BED format. However starting from the FASTQ I`m not sure what to do. I believe that to do so I need first to index the genome reference from canFam3 with BWA or Bowtie2, align the whole wolf genome with the canFam3, convert to SAM and then retrieve the sequence of interest. Is that right? Or there is an shorter path to work?

bwa sam fastq bowtie2 genome • 1.2k views
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by viscardi.neander10

If there's no wolf reference genome then your only options are (1) align to the most similar available genome, call variants, make a reference out of the results and extract from that or (2) assemble into contigs, blast the gene of interest against that and extract the resulting region.

ADD REPLYlink written 3.7 years ago by Devon Ryan94k

Can you show me te steps or softwares to use on each step?

After alignment i converted the file .sai to .sam with the command:

bwa samse -f out.sam hg19 input.sai input.fastq

then i`m thinking to converto to .bam to work faster

samtools view -bS out.sam -o output.bam

and then just work normally with samtools?

samtools view -h output.bam 20:7358932-7378248

Thanks a lot :)

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by viscardi.neander10

You'll be able to find plenty of tools by searching this site.

ADD REPLYlink written 3.7 years ago by Devon Ryan94k

I still cannot find the solution...

my steps.


 bwa index -p <prefix_name> -a bwtsw <referencegenome.fa>

2) align

 bwa aln <prefix> <genome_wolf.fastq.> > <out.sai>

3)convert to sam

bwa samse -f <wolf.sam> <prefix> <out.sai> <genome_wolf.fastq>

4) convert to bam

 samtools view -bS wolf.sam > wolf.bam

5) index wolf

samtools sort wolf.bam wolf.sorted.bam
samtools index aling_wolf_dog.sorted.bam
samtools view -h wolf.sorted.bam chr38:start-end > gene.fastq

6) I'm trying to assemble the paired-ends of this gene.fastq, because I ended with the short reads from this fastq file:

ERR868146.74788203  0   chr38   2442760 37  112M    *   0   GGCCGGGGACTGAGGGGATCTGCAAGTGCCTTTGCTTGGGAAACATCTGTTTGATTTGACTCCCTCTGAGCCAGGCCCAGCACAAGGGCTGGAAATGTGGCCAAAGGAGCAG    BBBFF7F7BFW]]]]]]V]]]]V]]]W$W]]]]\F]]]]]]]]]V]]]]]]]]]\]]]W]]]]]L]]]W]]]L]]R]]]WF9Z]]]]\\]]9V]ZV]Z]]<FF<FFFB<<BB    XT:A:U  NM:i:3  X0:i:1  X1:i:0  XM:i:3  XO:i:0  XG:i:0  MD:Z:81G20G0G8
ERR868146.42086682  16  chr38   2442812 37  114M    *   0   GATTTGACTCCCTCTGAGCCAGGCCTAGTGCAAGGGCTGGAAATGTGGCCGGAGGAGCAGGGTCTGCAGAGGCTCCACCTCTCAGCCCTTGAGGGCAGACCCATACTGTACCCT  FBBBFFFFFFFFFF]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]FFFFFFFFFBBB  XT:A:U  NM:i:2  X0:i:1  X1:i:0  XM:i:2  XO:i:0  XG:i:0  MD:Z:25C2C85
ERR868146.22544628  0   chr38   2443400 37  101M    *   0   AGTAAAGCCATGTCACAGCTAATGTTTGGACCGTGGGGATTCGAATCCTGACTCCATGTCTTACTCGTGGTGGCACTTTGGAGATGGCCCAATTATTTCAT   ]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]F   XT:A:U  NM:i:2  X0:i:1  X1:i:0  XM:i:2  XO:i:0  XG:i:0  MD:Z:91G8C0

but what i need is one fasta file to compared with other Dogs genome

Thanks again, and sorry for bothering you

ADD REPLYlink modified 3.7 years ago by genomax80k • written 3.7 years ago by viscardi.neander10

Step 6 will be "call variants" (e.g., with GATK, freeBayes, or samtools mpileup). Step 7 will be to take the VCF file produced in step 6 and produce a new fasta file (e.g., with GATK's FastaAlternateReferenceMaker). You can then samtools faidx that get the sequence.

ADD REPLYlink written 3.7 years ago by Devon Ryan94k

Hummm, I'm running it and then will follow your steps! Thanks a lot! =D

samtools mpileup -uf canFam3.fa aling_wolf_dog.sorted.bam | bcftools call -mv > wolf.raw.vcf
ADD REPLYlink written 3.7 years ago by viscardi.neander10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1598 users visited in the last hour