Hello.
I have converted my FASTQ file to FASTA file by using the tool seqtk for the conversion.However, the FASTA file which i have received as an output is also showing the number of reads in it. I wanted a continuous FASTA file with no reads in it. I am attaching the picture of the FASTA format with my query. How could I achieve a FASTA file with no number of reads in it?
The reads in a FASTQ file are distinct sequencing events that don't have any connection to each other the way they are ordered. To make a single sequence (or at least several long sequences) from it, one has to assemble them first. So to get a "clean" FASTA file as you call it is not a simple matter of converting from FASTQ and removing all headers. The reads need to be assembled, and that is usually done from FASTQ files, as they contain base quality information which FASTA doesn't. Eventually, the assembly will result in a contiguous FASTA file such as those "we get from NCBI."
Your original question betrays your level of understanding of this topic, so I am not inclined to spend a lot of time explaining it. On balance, it is not likely to be productive for either one of us. Let's just say that assembly is a straightforward but not a trivial process, so I think you need to read up on that topic before attempting to do it.
I am assuming that you are doing de novo assembly, so here is a short list of assemblers:
But like Mensur said, maybe you should try de novo genome assembly using trinity I guess. If there is a reference, you can use hisat or STAR to align reads rather than converting fastq to fasta.
As I said, de novo assembly does not require a reference. A commonly used software trinity can do this job. I can't help much only if I could know what is your purpose and how do you get your fastq files.
Actually, I have been given a new fasta sequence of chromosome 1 of a cow,but i was not provided with its fastq files as these fastq files were destroyed during the formatting of server. So, I built the fastq format using quality score 30 by using the tool from BBmap suite. As a result, I carried out FASTqc on the file and found that there are many over-representative sequences in it (N's). So I trimmed the reads having the N, by using trimmomatic tool (SLIDING WINDOW) option. This gave me a new improved fastq file with no N,s in it. I wanted to convert this altered fastq file to its fasta format (with no reads, just like ncbi)for the sake of learning. Now, i do not know what to do to achieve this, should i assemble the reads in fastq file using trinity now or the fasta format i made using seqtk is enough?
Let me get the thing straight first because I'm a little bit confused.
You already have the fasta sequence of chromosome 1 of cow at the very first beginning? How do you get that?
If you already have that fasta, why bothering for re-assemble the cow genome. Besides, cow is a general species. Ensembl or some other databases must have the reference. You can directly download the genome fasta and its annotation file for learning. For example here
It seems you have some corrupted fastq files. Did you rebuild the fastq using the corrupted fastq? The quality control steps are fine for alignment. But if the altered fastq are created by corrupted fastq, it is discouraged to proceed genome assembly. You may find very low alignment rate referring to reference. If you use these corrupted data for de novo assembly, the assembly may not be correct.
Are you trying to assemble your reads into one contig?
No, I just want a clean Fasta file with no number of reads as we get from ncbi