Hi, Biostars.
My original fastq reads have restriction enzyme target sequence at the head of each.
(e.g. GGCCTTATATACATCGATCAAGATA......GGCC is the restriction enzyme target site)
and they have different length each other.
I would like to extract random reads from whole genome fasta.
Each random read has the target site at the head and read length corresponding to original fastq reads.
(e.g. original : GGCCTTATATACATCGATCAAGATA
random : GGCCTTAAHCTAGATCGATCGCGT)
Now, I realize this random extraction by bedtools.
- - - - - - -
bedtools shuffle -i BED -g GENOME.FA > a.bed
bedtools getfasta -fi GENOME.FA -bed -fo b.fasta
Finaly, extract fasta reads with restriction enzyme target site from b.fasta by grep.
- - - - - - -
I mapped original fastq reads by Bowtie2. The BED file is this mapped bed by Bowtie2.
So this BED has sequence and length information corresponding to original fatstq.
But this processes are not necessary to get reads meeting my needs.
Anyone has efficient idea fitting my needs.
My English is not good but hope it doesn't cause any toruble.
Thanks.
Can you clarify what is there inside BED ? What do u mean by "finally extract fasta read from b.fasta ? After running getfasta ?
Thank you Goutham Atla. Sorry for not explaining enough.
I mapped original fastq reads to whole gemone by Bowtie2.
The BED you mentioned is this mapped bed file.
"What do u mean by "finally extract fasta read from b.fasta ? After running getfasta ?"
---Yes, after runnig fetfasta, I extract them by grep.
Thank you