Converting FASTA/FASTQ file into GFF3/GTF
1
0
Entering edit mode
9 months ago
vedikaa96 • 0

I have tried to convert FASTA/FASTQ file into GFF3/GTF file. Firstly, I converted FASTA/FASTQ file into bam (by samtools) as well as the bed file enter link description here and enter link description here and then converted them into a GFF file. But the commands are not converting properly. Kindly tell me how I can convert FASTA/FASTQ file into GFF3/GTF file to study gene structure in plants?

gff • 1.2k views
ADD COMMENT
0
Entering edit mode

But the commands are not converting properly.

well, that is your real problem. Tell us what you've done and what happened.

ADD REPLY
0
Entering edit mode

Firstly, I converted the FASTA file to a FASTQ file by the bwa command

bwa mem file.fa default_R1.fastq > default.bam 

then the bam file was indexed by

samtools index default_sorted.bam

But it gave me error:

[E::sam_index_build3] SAM file "default_sorted.bam" not BGZF compressed
samtools index: failed to create index for "default_sorted.bam"

After this, I compressed the 'default_sorted.bam' into zip file by

bgzip -c default_sorted.bam > default_sorted.bam.gz

But then also, it was unable to index the bam file.

Then, to convert the FASTA file to a bed file, I used

samtools faidx $fasta

and then,

bioawk -c fastx '{print $name"\t0\t"length($seq)}' file.fa

but this again didn't give me any appropriate results.

ADD REPLY
0
Entering edit mode

I'd like to clarify some concepts here.

You did not "convert" FASTA to FASTQ. You're misusing the term convert but that's a whole different battle. You used bwa to align a FASTQ to a FASTA and generate an alignment output (SAM here). You tried indexing an uncompressed SAM file, which doesn't work. You then BGZF compressed a SAM file instead of simply using a better format - BAM, which is technically fine but just ... odd. Also, from your descriptions, you did not intend to do this - you are indeed looking to work with BAMs.

OK, the "convert" battle now. You cannot "convert" files with different information content. You can convert a CSV to a TSV - no information lost or gained there. I'm being a little pedantic here, but you cannot even convert a GenBank file to FASTA format. "Convert" is a convenient way to say it but you're essentially either extracting a subset of information and writing it to a new format (GenBank is more information rich than FASTA) or if you're going the other way (low info content to high info content), you'll generate fake information and/or leave optional fields empty.

When you talk about completely incompatible content formats such as FASTA (sequence content) and BED (coordinate content), "convert" makes absolutely no sense. From your bioawk command like, it looks like you're trying to extract and summarize some information to generate tab-delimited content, which is NOT bed.

Please understand data and file formats, they are a huge majority of what we work with.

ADD REPLY
0
Entering edit mode
9 months ago
ATpoint 82k

bwa does output sam, not bam. To output bam, use samtools:

bwa mem (...) | samtools view -o out.bam

Or sort directly:

bwa mem (...) | samtools sort -o out_sorted.bam

Sorted BAM files then can go into samtools index.

Then, to convert the FASTA file to a bed file, I used 'samtools faidx $fasta' and then, 'bioawk -c fastx '{print $name"\t0\t"length($seq)}' file.fa' but this again didn't give me any appropriate results.

How is the file supposed to look like, just name, 0 as 2nc column and length as 3rd column?

ADD COMMENT

Login before adding your answer.

Traffic: 1675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6