The requirement for the MER database

A small collection of SWISS-PROT and EMBL entries are taken from the Mer Operon, a bacterial gene cluster that is found in many bacteria for the detoxification of Mercury Hg2+ ions. These provide the raw data to a database, which is called MER. The MER database contains four tables:

proteins - A table of protein structure details, extracted from a collection of SWISS-PROT entries.

dnas - A table of DNA sequence details, extracted from a collection of EMBL entries.

crossrefs - A table that links the extracted protein structures to the extracted DNA sequences.

citations - A table of literature citations extracted from both the SWISS-PROT and EMBL DNA entries.

Once the raw data is in the database, SQL can be used to answer questions about the data, for instance:

1. How many protein structures in the database are longer than 200 amino acids in length?

2. How many DNA sequences in the database are longer than 4000 bases in length?

3. What's the largest DNA sequence in the database?

4. Which protein structures are cross-referenced with which DNA sequences?

5. Which literature citations reference the results from the previous question?

Of course, it is possible to determine answers to these questions manually, as follows:

• Print out all the SWISS-PROT and EMBL entries of interest.

• Sift through the printouts visually, noting the data of interest.

which is probably (depending on the number of entries examined) no more than a few hours' work. A computer program could be written to automate the collection of the interesting pieces of data, which would probably reduce the amount of time required from hours to tens of minutes, depending on how complicated the computer programs are and whether they have to be written from scratch. Compare tens of minutes and a few hours to the length of time it takes an SQL-capable database system to answer each of these questions: no more than a few seconds.

0 0

Post a comment