PBMdb database manual

 Back to Main menu  

Overview

Introduction

The PBM database is a sample tracking system for protein binding microarray experiments. It has several components to record detailed experimental annotations (expression vectors, plasmids, binding domains, protein extracts, array layouts and pbm assays), calculate enrichment scores for N-mers, and perform quality control (QC). The typical workflow for an assay is as follows:

Enter plasmid
Enter Protein extract
Enter Assay Info
Upload raw data
Calculate Scores
Perform QC

The steps highlighted in grey are done through the web interface, while the steps highlighted in yellow are performed using custom scripts on the same analysis station where the image quantification is done. Click on each step in the analysis to get more information.

Annotating experiments

Plasmid, protein extract and assay annotations can be viewed and modified through the "Manage Data" section in the main menu. Clicking on a category will take you to a table with a listing of all entries for that category in the database. Details of existing records can be viewed by clicking the button that is associated with each record in the table view. You can also make changes by clicking the button. New entries can be made by clicking the "Add" button at the bottom of table view, or you can click to make a copy of an existing record. A description of each field is shown in yellow when adding or modifying records.

In addition to adding single records, it is also possible to do a batch upload of several new records at once by clicking the "Batch" links associated with each category in the main menu.

When adding new plasmids, you will be presented with pull-down menus with availabe plasmid backbones, DNA binding domains and species. If an entry you're looking for is not present in the list, you can add it in the Domains, Plasmid Backbones, or Species tables. Before you do this, please make sure that the entry you're looking for is not listed under a slightly different name!

Uploading raw data and calculating enrichment scores

If you are not used to working with a linux/unix terminal, please have a look at this web page which gives an overview of the basic commands you need to find your way around. Also make sure that you are familiar with the input file naming conventions.

Uploading raw data

Once you have quantified your arrays and entered the associated annotations into the database, you can load the probe intensity data into the database. First open a terminal window on the analysis station by double-clicking the Konsole icon () on the desktop. Now change to the directory where you saved your files by entering the following command (replacing NAME with the name of the directory that holds your files):

      cd /mnt/microarray_data/pbm-project/quantified-data/NAME
   

Now enter the command below to upload your datafiles. Don't worry about files that are already uploaded since these are ignored by the script.

      pbm-loaddata-imagene *.txt
   
Calculating enrichment scores

Once all your datafiles have been uploaded, enter the following command to calculate Z-scores and E-scores and store the results in the database.

      sync-pbmdb-scores
   

Quality control

The last step in the PBM data analysis is to assess whether an experiment was successful. A basic QC analysis can be done from the assay table by clicking on the links in the 'QC' column. There are three types of QC analysis that can be done:

At the bottom of each QC report you will have the option to flag assays as 'passed' or 'failed'. Note that any assay that has no E-scores above 0.45 will automatically be flagged as 'failed' when the enrichment scores are uploaded to the database.

Exporting data

There are two ways to get the raw or score data for an assay. You can export data for individual assays by clicking the 'export' link in the assay view in the web interface. You can also do a bulk export using the 'pbm-export-scores' and 'pbm-export-rawdata' scripts on the analysis station. Since all the scores are precalculated the export is very fast.

Here are some bulk export examples:

* Get E-scores for assay IDs 3, 10, and 100

      pbm-export-scores -type E -sys_name 3 10 100
   

* Same for Z-scores

      pbm-export-scores -type Z -sys_name 3 10 100
   

* Read assay identifiers from a file (one assay ID per line)

      pbm-export-scores -type E -sys_name -file file-with-assayIDs.txt
   

What do I do if...

...I selected the wrong protein ID for an assay?

Go to the Assay view and open the assay for editing by clicking the button. Select the correct protein identifier from the pull-down menu and save the record. This will automatically update the experiment annotations.

...I need to re-quantify an image and upload new probe intensity data for an assay?

Go to the Assay view and open the assay in question for editing by clicking the button. Change the current filename to the name of the new quantified data file you want to replace it with, and save the record. You will be presented with a message explaining that the existing raw data and enrichment scores have been purged from the database. This will also reset the QC flag for the assay. You can now upload the new quantified data on the analysis station by running 'pbm-loaddata-imagene new-file-name.txt', followed by 'sync-pbmdb-scores' to recalculate the enrichment scores.

...I selected the wrong array layout for an assay?

Go to the Assay view and open the assay for editing by clicking the button. Correct the array layout and save the record. You will now be presented with a message that the existing score data for this assay was purged from the database. This will also reset the QC flag for the assay. Now run 'sync-pbmdb-scores' on the analysis station to calculate and upload a new set of enrichment scores for the correct layout.

...I selected the wrong assay condition?

The procedure in this case is the same as for fixing an array layout.

...I need to add an assay that was done with more than one protein?

To deal with PBM assays that were done with more than one protein (e.g. heterodimers), you will need to define which proteins that were assayed together in one set. To do this, go to the main menu, click on Protein sets, and then click Add to define a new set of proteins. You can assign a protein set ID along the lines of a protein identifier, where the number following the '.' indicates different versions/syntheses of the same protein pool. You can also add a description here that will make it easier for you to keep track the various sets.

Once you have created a new protein set, go to Set assignments to associate the protein set with the plasmids that were used to create it. You will need to add a record for each protein/plasmid in the set. Once you have done the PBM assay, you can select the appropriate protein set ID from the pull down menu when you enter it into the database.

...I need to upload a western gel?

It's possible to upload western blots for protein extracts, though this feature never really caught on. If you must, you can add a blot in the main menu. Before uploading, each lane in the western/gel image should be annotated with the protein identifier (please use the full protein identifier and not just the clone ID or gene name) and the size of the marker bands. Marking can be done by clearly labeling the dried gel or western image prior to scanning, or afterwards using photoshop.

Please make sure that the image is no wider than 1000 pixels, and the resolution should be around 150dpi. Once the image has been annotated, save a copy in the pbm project folder on blackbox and then upload it to the pbm database.

Input file naming conventions

Array images

Arrays should be scanned in the default orientation on an agilent scanner (make sure that 'split and rotate images' is not selected in the general preferences). The default slide orientation is BCleft, DNA back, i.e. it will result in an image with the barcode on the left.

Where YYYY-MM-DD is the image scan date, BARCODE is the complete Agilent barcode, S## is the scan ID (this number is incremented by the Agilent scanner if you make multiple scans of one slide), R# is the number of times the slide has been reused (i.e. '0' for a new slide and increment by one each time the slide has been stripped).

Quantified data

The first 4 fields of the quantified data filename correspond to the image file name, the G# field should be used to indicate the grid identifier (A,B,C or D) on a 4x44k slide if each grid was quantified separately. Note that templates are available to make analyzing the Agilent 1M and 4x44k slides a bit easier and less error-prone. The DYE field should be used to indicate which dye channel the quantified data corresponds to (e.g. cy3 or cy5).

Western Images

Where YYYY-MM-DD is the date that it was run (year-month-day format), Name is the name of the person that ran the gel and Gel-name is any other identifier that you want to give to the file, e.g. the number of the gel that you ran that day.