Question: Bowtie2-Samtools. Too many degenerate bases
2
gravatar for saabalde
2.7 years ago by
saabalde20
saabalde20 wrote:

Hi all,

this is my first-question message in the forum, so I hope it's in the right place. I've had a look to related threads with a similar topic, but I couldn't find another one with this issue, what was helpful to me.

I have a problem using Bowtie2 and Samtools. I've assembled paired-end reads usign Trinity. To get longer and/or more complete contigs, I've mapped the reads against these contigs using Bowtie2.

The problem comes when I convert this sam file in a fastq file. I use this command line (seen in http://samtools.sourceforge.net/mpileup.shtml):

samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq  

I convert from fastq to fasta using:

seqtk fq2fa file.fastq > file.fasta  

And too many of my transcripts have N's. There are transcripts without any, transcripts full on N's and transcripts with many N's along the sequence. So when I try to convert them into protein sequences, I get sequences full of X's.

I guess the reason is bcftools and vcfutils detect all the variant callings and they cannot decide which base is the right one.

How can I say them to select the most frequent base in each case, since I don't want to get variant calls? If there is another approach, like not using bcftools or vcfutils, or whatever (I can't imagine other options...) is welcome.

The version of the software I'm using is:
bowtie2 -> BOWTIE/2.2.6
samtools -> SAMTOOLS/0.1.18

I hope the problem is well explained and thanks in advance,
Samu

ADD COMMENTlink modified 2.7 years ago by Devon Ryan85k • written 2.7 years ago by saabalde20

unrelated: your version of samtools is old

ADD REPLYlink written 2.7 years ago by Pierre Lindenbaum113k

I know, but I work in a cluster without administrator permission. Sometimes it's quite difficult to be updated.

ADD REPLYlink written 2.7 years ago by saabalde20

That's what the home directory is for :)

ADD REPLYlink written 2.7 years ago by Devon Ryan85k

you don't need to be administrator to install samtools in your home.

ADD REPLYlink written 2.7 years ago by Pierre Lindenbaum113k

I know, there is no excuses for that.

ADD REPLYlink written 2.7 years ago by saabalde20

Try to create a new folder for your reference (ref.fa in samtools command) put it there alone, index it there and perform your commands again with new reference file - so the only thing changed is the destination of reference file. This might help
 

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Max Ivon110

So, how could that output a different result, since all the files are the same? I'll try, I'm just trying to understand.

I'll update you. Many thanks,

Samu

ADD REPLYlink written 2.7 years ago by saabalde20

There is a bug with several version of samtools (dont know which exactly). It results in improper mpileup (reference is not recognised properly and all or major part is substituted with N). Dont know why it helps, so im not sure wether it will help you or no - may be you faced another error, but it is not so hard to test.
 

ADD REPLYlink written 2.7 years ago by Max Ivon110
1
gravatar for Devon Ryan
2.7 years ago by
Devon Ryan85k
Freiburg, Germany
Devon Ryan85k wrote:

There's no way to make samtools do what you want. Instead, give GATK's FastaAlternateReferenceMaker (or whatever it's called) a try. I don't know if that handles indels.

ADD COMMENTlink written 2.7 years ago by Devon Ryan85k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1959 users visited in the last hour