Question

Feature Counts how to identify each sample

0

Entering edit mode

3.5 years ago

luzglongoria ▴ 50

Hi there,

I have run an alignment by using STAR by using the --readFilesManifest option since I have several samples and I wanted to run it at once:

STAR --runThreadN 20 --genomeDir /path/to/the/folder/ --readFilesManifest /path/to/the/file/samples.txt

Like this I have an unique .bam file with all the samples inside that are now aligned with my reference genome.

The .txt file is a tab limited file with all the samples that belong to 4 different groups .It looks like:

bm-R1_1.fp.fq.gz        bm-R1_2.rp.fq.gz        bm
bm-R2_1.fp.fq.gz        bm-R2_2.rp.fq.gz        bm
bm-R3_1.fp.fq.gz        bm-R3_2.rp.fq.gz        bm
dpi8-R1_1.fp.fq.gz  dpi8-R1_2.rp.fq.gz  dpi8
dpi8-R2_1.fp.fq.gz  dpi8-R2_2.rp.fq.gz  dpi8
dpi8-R3_1.fp.fq.gz  dpi8-R3_2.rp.fq.gz  dpi8
dpi12-R1_1.fp.fq.gz     dpi12-R1_2.rp.fq.gz     dpi12
dpi12-R2_1.fp.fq.gz     dpi12-R2_2.rp.fq.gz     dpi12
dpi12-R3_1.fp.fq.gz     dpi12-R3_2.rp.fq.gz     dpi12
dpi22-R1_1.fp.fq.gz     dpi22-R1_2.rp.fq.gz     dpi22
dpi22-R2_1.fp.fq.gz     dpi22-R2_2.rp.fq.gz     dpi22
dpi22-R3_1.fp.fq.gz     dpi22-R3_2.rp.fq.gz     dpi22
c1_1.fp.fq.gz   c1_2.rp.fq.gz   control
c2_1.fp.fq.gz   c2_2.rp.fq.gz   control
c3_1.fp.fq.gz   c3_2.rp.fq.gz   control
c4_1.fp.fq.gz   c4_2.rp.fq.gz   control
c5_1.fp.fq.gz   c5_2.rp.fq.gz   control
c6_1.fp.fq.gz   c6_2.rp.fq.gz   control
c7_1.fp.fq.gz   c7_2.rp.fq.gz   control
c8_1.fp.fq.gz   c8_2.rp.fq.gz   control
c9_1.fp.fq.gz   c9_2.rp.fq.gz   control
c10_1.fp.fq.gz  c10_2.rp.fq.gz  control
c11_1.fp.fq.gz  c11_2.rp.fq.gz  control
c12_1.fp.fq.gz  c12_2.rp.fq.gz  control

After some quality analyses I have run Feature Counts with the command:

featureCounts -a /path/to/the/folder/genome_reference/genomic.gtf -o Table.count sorted_aligment.bam

The output is:

head Table.count

Geneid  Chr Start   End Strand  Length  /Analysis/sorted_aligment.bam
gene-CpipJ_CPIJ020271   NW_001889829.1;NW_001889829.1;NW_001889829.1    7980;8477;12299 7992;8702;12311 +;+;+   252 0
gene-CpipJ_CPIJ020269   NW_001889825.1;NW_001889825.1;NW_001889825.1;NW_001889825.1;NW_001889825.1;NW_001889825.1;NW_001889825.1    2864;3147;11343;11611;11814;12144;12472 3065;3288;11540;11799;12078;12255;12515 +;+;+;+;+;+;+   1152    72680
gene-CpipJ_CPIJ039441   NW_001889824.1  242 324 +   83  0
gene-CpipJ_CPIJ039442   NW_001889824.1  976 1048    -   73  0
gene-CpipJ_CPIJ020194   NW_001889821.1  14098   14682   +   585 2952
gene-CpipJ_CPIJ020082   NW_001889820.1;NW_001889820.1   8810;11422  8815;11967  +;+ 552 1081
gene-CpipJ_CPIJ020329   NW_001889809.1;NW_001889809.1   492;8246    806;8257    -;- 327 1
gene-CpipJ_CPIJ020172   NW_001889802.1  9947    10045   +   99  0

But I want the expression not only by gene ID but also by sample. I have cut this file from FeatureCounts in order to keep the gene ID and the expression level by:

cut -f1,7-8 Table.count> Table.count_cut

And I get:

Geneid  /Analysis/sorted_aligment.bam
gene-CpipJ_CPIJ020271   0
gene-CpipJ_CPIJ020269   72680
gene-CpipJ_CPIJ039441   0
gene-CpipJ_CPIJ039442   0
gene-CpipJ_CPIJ020194   2952
gene-CpipJ_CPIJ020082   1081
gene-CpipJ_CPIJ020329   1
gene-CpipJ_CPIJ020172   0
gene-CpipJ_CPIJ020291   12
gene-CpipJ_CPIJ039423   0
gene-CpipJ_CPIJ020231   24587

I have read there is a way of have all the info I want together https://www.bioconductor.org/help/course-materials/2016/CSAMA/lab-3-rnaseq/rnaseq_gene_CSAMA2016.html#alternative---counting-with-featurecounts-in-rsubread but I don't understand very well how. I am sure there is a very simple way of doing but I don't find how.

Could anyone help me in including the info from the file sample.txt in the expression Table from FeatureCounts?? Like this I will see the expression of each gene in each sample and I could compare the expression level depending on the moment the sample was taken.

Any help it would be more than welcome.

Thank you so much.

featurecounts RNA genomereference RNA-Seq • 1.2k views

ADD COMMENT • link 3.5 years ago by luzglongoria ▴ 50

1

Entering edit mode

have read there is a way of have all the info I want together

Feed all of your BAM files to featureCounts at one time to get a matrix of read counts. Tip: Provide BAM file names in the same order as you want the columns to be (groups etc) to not have to manipulate them later.

featureCounts -a /path/to/the/folder/genome_reference/genomic.gtf -o Table.count sorted_aligment_1.bam sorted_aligment_2.bam sorted_aligment_3.bam sorted_aligment_4.bam ..

ADD REPLY • link 3.5 years ago by GenoMax 144k

0

Entering edit mode

Thank you for your answer.

The thing is that I have an only BAM file. I run STAR and created an unique BAM file with all the samples inside. Like this I saved time :) So I don't have these BAM file you refer to.

ADD REPLY • link 3.5 years ago by luzglongoria ▴ 50

1

Entering edit mode

Only way to recognize your individual samples would be to look at the RG tag. It appears that you did not run your alignment with the necessary STAR option. So you may have to re-do these alignments.

ADD REPLY • link 3.5 years ago by GenoMax 144k