subread keeps makes a weird .sam file, have i done something wrong?
0
0
Entering edit mode
8 months ago
amy_houseman ▴ 20

Hi all,

I have DNA Sanger sequence FASTA files from patient TGM1 genes, and have been trying to use subread to align them to the TGM1 coding reference sequence from NCBI.

I keep getting a weird red looking .sam file as the result from the alignment but it doesn't look correct at all. I've attached here: weird .sam file

So far the code I've used has been:

subread-buildindex -o TGM1_index /home/amyhouseman/Desktop/fasta_reference_TGM1_folder/TGM1_ref_sequence

and

subread-align -t 1 -i TGM1_index -r /home/amyhouseman/Desktop/katja_fasta_seq/fasta_trimmed_sequences/fasta_trimmed_1_all/fasta_trimmed_1_final.fasta -o my_results.sam

This doesn't look right to me, so I was wondering if anyone has seen an error like this before? My DNA sequences are only around 500bp, and I only used one to map with, so i could check if it worked before using all 42 of them.

Thanks! Amy *This is on Ubuntu terminal

subread rsubread alignment DNA reference • 311 views
ADD COMMENT
0
Entering edit mode

Cross-ref from : C: Best RNA-Seq aligner: A comparison of mapping tools

amy_houseman : subread may have generated a BAM file instead of a SAM file. BAM is a binary format so it will look like what you posted if you tried to view the file.

Can you post the output of this command file your_result.sam?

ADD REPLY
0
Entering edit mode

Hi!

The result of file my_results.sam is this: (which also doesn't look very promising): my_results.sam: Blocked GNU Zip Format (BGZF; gzip compatible), block length 246

Thanks again

ADD REPLY
0
Entering edit mode

Actually that is an indication that this is a BAM file. Do you have samtools installed? If so try samtools view my_results.sam | head -5. Post that result here.

You should move the name from my_results.sam to my_results.bam to avoid any future confusion once we confirm that this is a BAM file.

ADD REPLY
0
Entering edit mode

Hi,

So after installing samtools and doing the above command, I get this result:

samtools view I haven't changed the .sam to a .bam file yet, so I'll await your response till I do so. Thanks for all your help.

Amy

ADD REPLY
0
Entering edit mode

you don't need a screenshot to show us one line of bam.

Furthermore by using samtools , which reads SAM/BAM and removes the header by default, we cannot say anything about your file.

ADD REPLY
0
Entering edit mode

I did a bit of testing with subread. Even though the example in subread manual says .sam it seems to be producing a .bam format file by default. You should be able to see that in the job output it produces. Here is my example.

|| Function      : Read alignment (DNA-Seq)                                   ||
|| Input file    : db_dna.fa                                                  ||
|| Output file   : subread.sam (BAM)                                          ||
|| Index name    : subind

It can use a fasta file of reads as input and produce a BAM. So your output file should be valid and is in BAM format. Verify that you see a header when do samtools view -H my_results.sam.

ADD REPLY
0
Entering edit mode

Assuming everything is ok you can use these directions to call SNP: A: samtools SNP calling

If this was whole genome data and you are aligning against just TGM1 gene then that can be a problem.

ADD REPLY

Login before adding your answer.

Traffic: 2329 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6