I have tried to convert FASTA/FASTQ file into GFF3/GTF file. Firstly, I converted FASTA/FASTQ file into bam (by samtools) as well as the bed file enter link description here and enter link description here and then converted them into a GFF file. But the commands are not converting properly. Kindly tell me how I can convert FASTA/FASTQ file into GFF3/GTF file to study gene structure in plants?
You did not "convert" FASTA to FASTQ. You're misusing the term convert but that's a whole different battle. You used bwa to align a FASTQ to a FASTA and generate an alignment output (SAM here). You tried indexing an uncompressed SAM file, which doesn't work. You then BGZF compressed a SAM file instead of simply using a better format - BAM, which is technically fine but just ... odd. Also, from your descriptions, you did not intend to do this - you are indeed looking to work with BAMs.
OK, the "convert" battle now. You cannot "convert" files with different information content. You can convert a CSV to a TSV - no information lost or gained there. I'm being a little pedantic here, but you cannot even convert a GenBank file to FASTA format. "Convert" is a convenient way to say it but you're essentially either extracting a subset of information and writing it to a new format (GenBank is more information rich than FASTA) or if you're going the other way (low info content to high info content), you'll generate fake information and/or leave optional fields empty.
When you talk about completely incompatible content formats such as FASTA (sequence content) and BED (coordinate content), "convert" makes absolutely no sense. From your bioawk command like, it looks like you're trying to extract and summarize some information to generate tab-delimited content, which is NOT bed.
Please understand data and file formats, they are a huge majority of what we work with.
bwa does output sam, not bam. To output bam, use samtools:
bwa mem (...) | samtools view -o out.bam
Or sort directly:
bwa mem (...) | samtools sort -o out_sorted.bam
Sorted BAM files then can go into samtools index.
Then, to convert the FASTA file to a bed file, I used 'samtools faidx $fasta' and then, 'bioawk -c fastx '{print $name"\t0\t"length($seq)}' file.fa' but this again didn't give me any appropriate results.
How is the file supposed to look like, just name, 0 as 2nc column and length as 3rd column?
well, that is your real problem. Tell us what you've done and what happened.
Firstly, I converted the FASTA file to a FASTQ file by the bwa command
then the bam file was indexed by
But it gave me error:
After this, I compressed the 'default_sorted.bam' into zip file by
But then also, it was unable to index the bam file.
Then, to convert the FASTA file to a bed file, I used
and then,
but this again didn't give me any appropriate results.
I'd like to clarify some concepts here.
You did not "convert" FASTA to FASTQ. You're misusing the term convert but that's a whole different battle. You used bwa to align a FASTQ to a FASTA and generate an alignment output (SAM here). You tried indexing an uncompressed SAM file, which doesn't work. You then BGZF compressed a SAM file instead of simply using a better format - BAM, which is technically fine but just ... odd. Also, from your descriptions, you did not intend to do this - you are indeed looking to work with BAMs.
OK, the "convert" battle now. You cannot "convert" files with different information content. You can convert a CSV to a TSV - no information lost or gained there. I'm being a little pedantic here, but you cannot even convert a GenBank file to FASTA format. "Convert" is a convenient way to say it but you're essentially either extracting a subset of information and writing it to a new format (GenBank is more information rich than FASTA) or if you're going the other way (low info content to high info content), you'll generate fake information and/or leave optional fields empty.
When you talk about completely incompatible content formats such as FASTA (sequence content) and BED (coordinate content), "convert" makes absolutely no sense. From your bioawk command like, it looks like you're trying to extract and summarize some information to generate tab-delimited content, which is NOT bed.
Please understand data and file formats, they are a huge majority of what we work with.