Hi all,
I have DNA Sanger sequence FASTA files from patient TGM1 genes, and have been trying to use subread to align them to the TGM1 coding reference sequence from NCBI.
I keep getting a weird red looking .sam file as the result from the alignment but it doesn't look correct at all. I've attached here: weird .sam file
So far the code I've used has been:
subread-buildindex -o TGM1_index /home/amyhouseman/Desktop/fasta_reference_TGM1_folder/TGM1_ref_sequence
and
subread-align -t 1 -i TGM1_index -r /home/amyhouseman/Desktop/katja_fasta_seq/fasta_trimmed_sequences/fasta_trimmed_1_all/fasta_trimmed_1_final.fasta -o my_results.sam
This doesn't look right to me, so I was wondering if anyone has seen an error like this before? My DNA sequences are only around 500bp, and I only used one to map with, so i could check if it worked before using all 42 of them.
Thanks! Amy *This is on Ubuntu terminal
Cross-ref from : C: Best RNA-Seq aligner: A comparison of mapping tools
amy_houseman :
subread
may have generated a BAM file instead of a SAM file. BAM is a binary format so it will look like what you posted if you tried to view the file.Can you post the output of this command
file your_result.sam
?Hi!
The result of file my_results.sam is this: (which also doesn't look very promising):
my_results.sam: Blocked GNU Zip Format (BGZF; gzip compatible), block length 246
Thanks again
Actually that is an indication that this is a BAM file. Do you have
samtools
installed? If so trysamtools view my_results.sam | head -5
. Post that result here.You should move the name from
my_results.sam
tomy_results.bam
to avoid any future confusion once we confirm that this is a BAM file.Hi,
So after installing samtools and doing the above command, I get this result:
Amy
you don't need a screenshot to show us one line of bam.
Furthermore by using
samtools
, which reads SAM/BAM and removes the header by default, we cannot say anything about your file.I did a bit of testing with
subread
. Even though the example insubread
manual says.sam
it seems to be producing a.bam
format file by default. You should be able to see that in the job output it produces. Here is my example.It can use a fasta file of reads as input and produce a BAM. So your output file should be valid and is in BAM format. Verify that you see a header when do
samtools view -H my_results.sam
.Assuming everything is ok you can use these directions to call SNP: A: samtools SNP calling
If this was whole genome data and you are aligning against just TGM1 gene then that can be a problem.