Hi, I had RNA-seq results from NextSeq 500 platform. They gave me two types of datasets for same samples with exactly same file name: blablabla.txt.gz and blablabla.bam files. I viewed them on 010 Editor. And, they look like same. What is the difference between txt.gz and bam file? Did bam files are the aligned/mapped files? It was not written as "sorted", could these files be sorted? If these are the aligned reads, how can I convert them into count matrices like featureCounts, htseq counts? Thanks for your help
A bam file usually contains the aligned reads, which you can sort using samtools. You can use featureCounts (recommended) for counting reads. If you read the manual you should be able to figure out how to do it, it's quite clear.
I have no idea how your .txt.gz file looks like. Note that you can also have an "unaligned" bam of reads, but with the information you provided here I can't tell.
If I were you I'd attempt to get some sort of raw data eg. FASTQ - this is a completely new platform - and go through a similar pipeline.
- Alignment (STAR? BWA mem?)
- Bam convert (Samtools)
- with GTF - get counts (featureCounts, htseq etc)
- differential expression - eg Degust
Please tell us what experiences you have with the genereader. I have not yet seen data from this platform. What is the read length for example ?