I have DNA Sanger sequence FASTA files from patient TGM1 genes, and have been trying to use subread to align them to the TGM1 coding reference sequence from NCBI.
I keep getting a weird red looking .sam file as the result from the alignment but it doesn't look correct at all. I've attached here: weird .sam file
So far the code I've used has been:
subread-buildindex -o TGM1_index /home/amyhouseman/Desktop/fasta_reference_TGM1_folder/TGM1_ref_sequence
subread-align -t 1 -i TGM1_index -r /home/amyhouseman/Desktop/katja_fasta_seq/fasta_trimmed_sequences/fasta_trimmed_1_all/fasta_trimmed_1_final.fasta -o my_results.sam
This doesn't look right to me, so I was wondering if anyone has seen an error like this before? My DNA sequences are only around 500bp, and I only used one to map with, so i could check if it worked before using all 42 of them.
Thanks! Amy *This is on Ubuntu terminal
Cross-ref from : C: Best RNA-Seq aligner: A comparison of mapping tools
subreadmay have generated a BAM file instead of a SAM file. BAM is a binary format so it will look like what you posted if you tried to view the file.
Can you post the output of this command
The result of file my_results.sam is this: (which also doesn't look very promising):
my_results.sam: Blocked GNU Zip Format (BGZF; gzip compatible), block length 246
Actually that is an indication that this is a BAM file. Do you have
samtoolsinstalled? If so try
samtools view my_results.sam | head -5. Post that result here.
You should move the name from
my_results.bamto avoid any future confusion once we confirm that this is a BAM file.
So after installing samtools and doing the above command, I get this result:
I haven't changed the .sam to a .bam file yet, so I'll await your response till I do so. Thanks for all your help.
you don't need a screenshot to show us one line of bam.
Furthermore by using
samtools, which reads SAM/BAM and removes the header by default, we cannot say anything about your file.
I did a bit of testing with
subread. Even though the example in
.samit seems to be producing a
.bamformat file by default. You should be able to see that in the job output it produces. Here is my example.
It can use a fasta file of reads as input and produce a BAM. So your output file should be valid and is in BAM format. Verify that you see a header when do
samtools view -H my_results.sam.
Assuming everything is ok you can use these directions to call SNP: A: samtools SNP calling
If this was whole genome data and you are aligning against just TGM1 gene then that can be a problem.