Perl Programming Course for Bioinformatics and Internet

Data Persistence

Objective:
To create a persistent storage for genomic data extracted from Unigene data.

Source files location:
Use the file ../data/unigene, if accessed from your home directory on bicourse.
You may also retrieve the file from this url: unigene
This file is a portion (200 IDs) from the Homo sapiens Unigene data available from ftp://ftp.ncbi.nih.gov/repository/UniGene/Hs.data.Z

Task:
  1. Parse the "unigene" file and extract the data associated with the fields ID, TITLE, CHROMOSOME, GENE and EXPRESS
  2. Create a "DBM" database with this data, using the UNIGENE id as a key.
  3. Write a simple script to query this database by CHROMOSOME, GENE or EXPRESS. Return the Unigene id and other associated data on each hit.
  4. The stored data can keep the original upper-lower case. The search should be case insensitive.

Perl modules to use:
  1. GDBM_File or DB_File

Important:
For testing on a Windows environment, you must use dbmopen since DB_File might not work.
These are equivalent:
tie %h, "DB_File", $dbFile, O_RDWR|O_CREAT, 0666 || die "Cannot open file $dbFile: $!\n";
untie %h;
dbmopen(%h,$dbFile,0666) || die "Cannot open file $dbFile: $!\n";
dbmclose(%h);



Dr Jaime Prilusky, course@weizmann.ac.il.