how to get info from GTExsampleattributesDS.txt file
0
0
Entering edit mode
3.8 years ago
rheab1230 ▴ 150

I am trying to do quality control of gene expression data. I am running a script to get sample info,tissues and sequencing method used from SampleattributesDS.txt file. The script:

#!/bin/bash/python
from collections import defaultdict
import gzip

def get_ids_from_vcf(vcf_file):
    with gzip.open(vcf_file) as fh:
        for line in fh:
            line = line.decode('utf-8')
            if not line.startswith("##"):
                break
        sample_ids = line.strip().split()[9:]
        return set(sample_ids)

vcf_file = '/new_rna-seq/quality_Control/chr22_annotate.vcf.gz'
vcf_sample_ids = get_ids_from_vcf(vcf_file)
tissue_donors = defaultdict(list)

samples_file = '/new_rna-seq/quality_Control/GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt'

with open(samples_file) as fh:
    hdr = fh.readline() # Drop header
    for line in fh:
        if line == '' or line == '\n':
            continue
        fields = line.strip().split('\t')
        sample_id = fields[0]
        tissue = fields[13]
        frz = fields[26]
        donor_id = '-'.join(sample_id.split('-')[:2])
        if donor_id in vcf_sample_ids and frz == 'RNASEQ':
            tissue_donors[tissue].append(sample_id + '\n')

for tissue in tissue_donors:
    with open(tissue.replace(' - ', '_').replace(' ', '_') + '_donors.txt', 'w') as fh:
        fh.writelines(tissue_donors[tissue])

While running this script: I am getting an error The error is:

python get_donors_by_tissue.py Traceback (most recent call last):
File "/home/TWAS/data_download/new_rna-seq/quality_Control/get_donors_by_tissue.py", line 28, in <module> frz = fields[30] IndexError: list index out of range

The error is pertaining to the file: GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt I have downloaded this attribute file from this link: https://www.gtexportal.org/home/datasets wget https://storage.googleapis.com/gtex_analysis_v8/annotations/GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt Could you anyone help me out with this? I am not able to understand how to get these info from this file

GTEx expression Sampleattributes.txt gene data SNP • 1.4k views
ADD COMMENT

Login before adding your answer.

Traffic: 4455 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6