Working with genomes in FASTQ format from ENA
0
0
Entering edit mode
7.7 years ago

Hi!

I',m working with the wolf genome and want to retrieve a specific region. The ENA only provides me the FASTQ file http://www.ebi.ac.uk/ena/data/view/ERS715549. If i had the SAM file i could simply use samtools to retrieve the specific region given a BED format. However starting from the FASTQ I`m not sure what to do. I believe that to do so I need first to index the genome reference from canFam3 with BWA or Bowtie2, align the whole wolf genome with the canFam3, convert to SAM and then retrieve the sequence of interest. Is that right? Or there is an shorter path to work?

genome fastq BWA BOWTIE2 SAM • 2.2k views
ADD COMMENT
1
Entering edit mode

If there's no wolf reference genome then your only options are (1) align to the most similar available genome, call variants, make a reference out of the results and extract from that or (2) assemble into contigs, blast the gene of interest against that and extract the resulting region.

ADD REPLY
0
Entering edit mode

Can you show me te steps or softwares to use on each step?

After alignment i converted the file .sai to .sam with the command:

bwa samse -f out.sam hg19 input.sai input.fastq

then i`m thinking to converto to .bam to work faster

samtools view -bS out.sam -o output.bam

and then just work normally with samtools?

samtools view -h output.bam 20:7358932-7378248

Thanks a lot :)

ADD REPLY
1
Entering edit mode

You'll be able to find plenty of tools by searching this site.

ADD REPLY
0
Entering edit mode

I still cannot find the solution...

my steps.

1)index

 bwa index -p <prefix_name> -a bwtsw <referencegenome.fa>

2) align

 bwa aln <prefix> <genome_wolf.fastq.> > <out.sai>

3)convert to sam

bwa samse -f <wolf.sam> <prefix> <out.sai> <genome_wolf.fastq>

4) convert to bam

 samtools view -bS wolf.sam > wolf.bam

5) index wolf

samtools sort wolf.bam wolf.sorted.bam
samtools index aling_wolf_dog.sorted.bam
samtools view -h wolf.sorted.bam chr38:start-end > gene.fastq

6) I'm trying to assemble the paired-ends of this gene.fastq, because I ended with the short reads from this fastq file:

ERR868146.74788203  0   chr38   2442760 37  112M    *   0   GGCCGGGGACTGAGGGGATCTGCAAGTGCCTTTGCTTGGGAAACATCTGTTTGATTTGACTCCCTCTGAGCCAGGCCCAGCACAAGGGCTGGAAATGTGGCCAAAGGAGCAG    BBBFF7F7BFW]]]]]]V]]]]V]]]W$W]]]]\F]]]]]]]]]V]]]]]]]]]\]]]W]]]]]L]]]W]]]L]]R]]]WF9Z]]]]\\]]9V]ZV]Z]]<FF<FFFB<<BB    XT:A:U  NM:i:3  X0:i:1  X1:i:0  XM:i:3  XO:i:0  XG:i:0  MD:Z:81G20G0G8
ERR868146.42086682  16  chr38   2442812 37  114M    *   0   GATTTGACTCCCTCTGAGCCAGGCCTAGTGCAAGGGCTGGAAATGTGGCCGGAGGAGCAGGGTCTGCAGAGGCTCCACCTCTCAGCCCTTGAGGGCAGACCCATACTGTACCCT  FBBBFFFFFFFFFF]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]FFFFFFFFFBBB  XT:A:U  NM:i:2  X0:i:1  X1:i:0  XM:i:2  XO:i:0  XG:i:0  MD:Z:25C2C85
ERR868146.22544628  0   chr38   2443400 37  101M    *   0   AGTAAAGCCATGTCACAGCTAATGTTTGGACCGTGGGGATTCGAATCCTGACTCCATGTCTTACTCGTGGTGGCACTTTGGAGATGGCCCAATTATTTCAT   ]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]F   XT:A:U  NM:i:2  X0:i:1  X1:i:0  XM:i:2  XO:i:0  XG:i:0  MD:Z:91G8C0

but what i need is one fasta file to compared with other Dogs genome

Thanks again, and sorry for bothering you

ADD REPLY
1
Entering edit mode

Step 6 will be "call variants" (e.g., with GATK, freeBayes, or samtools mpileup). Step 7 will be to take the VCF file produced in step 6 and produce a new fasta file (e.g., with GATK's FastaAlternateReferenceMaker). You can then samtools faidx that get the sequence.

ADD REPLY
0
Entering edit mode

Hummm, I'm running it and then will follow your steps! Thanks a lot! =D

samtools mpileup -uf canFam3.fa aling_wolf_dog.sorted.bam | bcftools call -mv > wolf.raw.vcf
ADD REPLY

Login before adding your answer.

Traffic: 3846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6