I have a RNA seq pair-end data from a bacteria. I mapped the data against a well-known reference genome using bowtie2. Then, I converted and sorted the resulting .sam file to .bam using samtools and I assembled the transcripts using StringTie:
samtools view -u bowtie2.sam | samtools sort -o bowtie2.bam stringtie bowtie2.bam -G ../GCF_000203855.3_ASM20385v3_genomic.gff -o output.gff
Finally, I would like to get the .fasta sequence for each transcript. Searching in internet I came across gffread. However, this sequence is the one contained in the reference genome and I am interested in the consensus sequence of all overlapping reads mapped in this region.
I found in internet you can a consensus sequence out of .sam files and reference genome using the following command:
samtools mpileup -uf ../GCF_000203855.3_ASM20385v3_genomic.fna BEE1ST_sorted.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq
However I got this error:
[warning] samtools mpileup option `u` is functional, but deprecated. Please switch to using bcftools mpileup in future. [mpileup] 1 samples in 1 input files Error: Could not parse --min-ac g Use of uninitialized value $l in numeric lt (<) at /usr/bin/vcfutils.pl line 566. Use of uninitialized value $l in numeric lt (<) at /usr/bin/vcfutils.pl line 566
I also found that you can get a consensus sequence using IGV software. After dealing a bit with it I managed to get a consensus sequence of a region containing my gene of interest (I was having trouble with selecting regions!). I blasted the consensus against refseq_genomes and it gives me the reference genome I used. The was good, a query coverage of 100% and 99% identity.
I would like to know if there is any command line-based tool where you can extract the consensus sequences for the assembled transcripts after mapping against a reference genome. Or what is you opinion on IGV sequence consensus method.