this is my first-question message in the forum, so I hope it's in the right place. I've had a look to related threads with a similar topic, but I couldn't find another one with this issue, what was helpful to me.
I have a problem using Bowtie2 and Samtools. I've assembled paired-end reads usign Trinity. To get longer and/or more complete contigs, I've mapped the reads against these contigs using Bowtie2.
The problem comes when I convert this sam file in a fastq file. I use this command line (seen in http://samtools.sourceforge.net/mpileup.shtml):
samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq
I convert from fastq to fasta using:
seqtk fq2fa file.fastq > file.fasta
And too many of my transcripts have N's. There are transcripts without any, transcripts full on N's and transcripts with many N's along the sequence. So when I try to convert them into protein sequences, I get sequences full of X's.
I guess the reason is bcftools and vcfutils detect all the variant callings and they cannot decide which base is the right one.
How can I say them to select the most frequent base in each case, since I don't want to get variant calls? If there is another approach, like not using bcftools or vcfutils, or whatever (I can't imagine other options...) is welcome.
The version of the software I'm using is:
bowtie2 -> BOWTIE/2.2.6
samtools -> SAMTOOLS/0.1.18
I hope the problem is well explained and thanks in advance,