Entering edit mode
2.0 years ago
JZX • 0
We know raw Iso-seq subreads in bam format which just store sequences and can be used to perform ccs, lima and cluster.
But if data from NCBI SRA database, the data are in fasta/fastq format，and I don't know how to process these data These fasta/fastq data have polyA and primer sequences.
I want to remove primer sequences and orient these data in the 5′-3′ direction just like Lima can do.
Thanks a lot!
However, the input file of PacificBiosciences/IsoSeq is in BAM format. It doesn't say anything about how to handle the FASTA format.
Sometimes submitters will submit original PacBio BAM/BAX files. You can take a look at "Original data" tab of the accession you are looking at. If you post the number(s) here I can take a look as well.
Yes, I noticed that the original format is listed at the end of the SRA run browser.
In addition, I found an early version of Smart pipeline （smrtanalysis-2.3.0）seems to process input files in fasta format.
Thanks a lot!
I'm also curious about converting fasta to bam. Is there a proper way? I see lots of folks trying to convert bam to fasta, (in which case you could use Samtools), but I have built my de novo transcriptome assembly with Trinity, and it's in fasta format. However, I'm interested in using BRAKER2 to train Augustus, and it requires the RNA-seq data as bam. Any ideas?
The accession number is SRR3147054, it seems no orgin bam files at NCBI.