Hello everyone, I am new to bioinformatics, and i am trying to retrieve a .fasta file of a gene located in human chr6 from a neandertal published sequence. The type of file with the one i begin with is a file.bam type of the single chr6 sequencing of an organism.
I have searched and realized that a useful tool for doing this is samtools, which i have already succesfully installed. I tried to view the file with the next syntax:
samtools view filename.bam chr6:posxx-posxx+1
I got an error, saying i should index it, so i proceeded to do so, with the needed sort command previous to the indexing, writting syntax as follows:
samtools sort filename.bam -o file.sorted samtools index filename.sorted
after sorting i got a single file with the .sorted extention, which i followed to apply the index command, task which gave me a filename.sorted.bai file, plus two other tmp.xxxx.bam files, with the filename.sorted.bai file supposed to be indexed already, since was the result of my last command.
after this, i am trying to view the desired genic region in order to know if it's present in the sequencing data, before proceeding to apply mpileup to the files. and retrieving my desired .fasta file from the sequence.
According to the first error message i got, an indexed file should be viewable through the view command, so i proceed with the following command:
samtools view filename.sorted.bai chr6:xx-xx+1
and it gives me the following error:
[E: :hts:hopen] Failed to open file filename.sorted.bai [E: :hts_open_format] Failed to open file filename.sorted.bai samtools view: failed to open "filename.sorted.bai", for reading: Exec format error
i try applying the same syntax to other files, say the temporal files, and it retrieves the next error:
[main_samview] random alignment retrieval only works for indexed BAM or CRAM files
... I have gone back to sort and index the same original filename.bam file, but asking for an output in a .bam format at the end rather than the .bai which is supposed to be the default output format when indexing. Nothing has worked.
At the end i tried viewing the whole sequence, instead of asking for a region, and it worked nicely, with a filename.bam file, which was not indexed, just sorted, and actually as well with a filename.sorted.bam.temp.xxxx.bam file, writting the folowing commands:
samtools view filename.bam samtools view filename.sorted.bam.temp.xxxx.bam
both commands retreived a huge ammount of readings, whith headers and everything, but when i narrow my command by giving the region it does not work!,
Perhaps i am missing something with the REGION syntax?, do i have to perform any process on the filename.bai file previous to trying to view an especific region?, i would appreciate any help!, i am using samtools version 1.6
I apologize for the lenght of my post, but i tried to be most explicit with my kind of errors and the pipeline that i have followed in sake of clarity.