This directory contains information about S. cerevisiae chromosomal features annotated in SGD, with files including chromosomal coordinates, ORF name to ID mappings, annotation changes, etc. For the Oracle database schema and specifications for SGD, see: http://db.yeastgenome.org/schema/SgdSchema.html Note that these files do not correspond exactly to the database tables described above so that we can provide more complete information within individual files. The archive/ subdirectory contains previous versions of these files. ============================================================================== scerevisiae_regulatory.gff: Thie file contains transcription factor binding sites and regulatory upstream ORFs (uORFs) that have been identified in Saccharomyces cereivisae. It is full compatible with the Generic Feature Format Version 3 (http://song.sourceforge.net/gff3.shtml). This is a standard format used by many groups. These sequence features can also be viewed in the GBROWSE genome browser at SGD. ============================================================================== saccharomyces_cerevisiae.gff : This file contains sequence features of Saccharomyces cerevisiae and related information such as GO annotation. It is fully compatible with Generic Feature Format Version 3 (http://song.sourceforge.net/gff3.shtml). It is updated nightly. This is a standard format used by many groups. It is used by SGD to load the GBROWSE resource. NOTE: A resgen_primers.gff file containing primer sequences and sequence coordinates is available from the following ftp directory: ftp://ftp.yeastgenome.org/yeast/sequence/primer_sequences/ ============================================================================== SGD_features.tab This file replaced the previous chromosomal_feature.tab file. This file is updated weekly (Saturday). Highlights of the changes include: 1. It contains information on current chromosomal features in SGD, including information about Dubious ORFs. It also contains the coordinates of intron, exons, and other subfeatures that are located within a chromosomal feature. 2. The relationship between subfeatures and the feature in which they are located is identified by the feature name in column #7 (parent feature). For example, the parent feature of the intron found in ACT1/YFL039C will be YFL039C. The parent feature of YFL039C is chromosome 6. 3. The coordinates of all features are in chromosomal coordinates. 4. Replacement of several feature types to be more consistent with Genbank files and other model organism databases. ORF is now gene, exon is now CDS. Columns within SGD_features.tab: 1. Primary SGDID (mandatory) 2. Feature type (mandatory) 3. Feature qualifier (optional) 4. Feature name (optional) 5. Standard gene name (optional) 6. Alias (optional, multiples separated by |) 7. Parent feature name (optional) 8. Secondary SGDID (optional, multiples separated by |) 9. Chromosome (optional) 10. Start_coordinate (optional) 11. Stop_coordinate (optional) 12. Strand (optional) 13. Genetic position (optional) 14. Coordinate version (optional) 15. Sequence version (optional) 16. Description (optional) The SGD_features.tab file is complemented by the GFF3 file, see below, called saccharomyces_cerevisiae.gff ============================================================================== annotation_change.tab : Contains information about annotation changes to the chromosomal features. This file lists features that have been either removed from the SGD or merged into another feature. In the case of merged features, the merged feature information is in the first 8 columns of the file, and the information about the feature currently in the dataset is in columns 9 through 12. This file lists all changes that are made to the FeatureType of a chromosomal feature. Therefore, an ORF that has been made Dubious will be listed in the fiile. The Date column is the date that the annotation change occurred. This file is updated weekly (Saturday). The columns are: 1) Merged or Deleted Feature (mandatory) 2) FeatureType (mandatory, multiples separated by |) 3) Chromosome (mandatory) 4) StartCoord (optional) 5) StopCoord (optional) 6) Strand (mandatory) 7) SGDID (optional) 8) SecondarySGDID (optional, multiples separated by |) 9) CurrentFeature (optional) 10) SGDID (optional) 11) Description (optional) 12) Note (optional, multiples separated by |) 13) Date (mandatory) ============================================================================== clone.tab : Contains information about yeast clones from Washington Unversity in St. Louis and the ATCC. It is updated weekly (Saturday), though the underlying data are rarely altered/updated. The columns are: 1) ATCC clone name (optional) 2) Washington University clone name (optional) 3) Chromosome (mandatory) 4) Start coordinate (mandatory) 5) Stop coordinate (mandatory) scerevisiae_clonedata.gff: Contains the above information in the Generic Feature Format Version 3 (http://song.sourceforge.net/gff3.shtml). ============================================================================== The following two files map various other IDs to ORF names. The SGDID is the recommended identifier for features in SGD. ============================================================================== dbxref.tab : Maps ORF names and SGDIDs to other IDs, including SwissProt, EC, etc. Currently, NCBI GI numbers are not included but NCBI DNA, protein, and RefSeq accession IDs are included. Please see below for more details. This file contains all ORFs. Updated weekly (Saturday). Columns are: 1) DBXREF ID 2) DBXREF ID source 3) DBXREF ID type 4) S. cerevisiae feature name 5) SGDID A description of the IDs currently represented in this file. DBXREF ID source: Candida DB DBXREF ID type: Gene ID Description: Gene ID of Candida albicans ortholog of the S. cerevisiae gene, from CandidaDB at the Institute Pastuer. Corresponding URL: http://genolist.pasteur.fr/CandidaDB/ DBXREF ID source: DIP DBXREF ID type: Gene ID Description: ID of the S. cerevisiae ORF used by the Database of Interacting Proteins (DIP) Corresponding URL: http://dip.doe-mbi.ucla.edu/dip/ DBXREF ID source: EUROSCARF DBXREF ID type: Gene ID Description: S. cerevisiae ORF name used at the European Saccharomyces cerevisiae Archive for Functional Analysis (EUROFAN), source for yeast deletion strains in Europe Corresponding URL: http://www.uni-frankfurt.de/fb15/mikro/euroscarf/ DBXREF ID source: GermOnline DBXREF ID type: Gene ID Description: ID of the S. cerevisiae ORF at GermOnline, a database of germ cell growth and gametogenesis. Corresponding URL: http://germonline.yeastgenome.org/ DBXREF ID source: IUBMB DBXREF ID type: EC number Description: EC number of the reaction catalyzed by the S. cerevisiae protein. EC numbers are assigned to reactions by the Internal Union of Biochemistry and Molecular Biology. Corresponding URL: http://www.expasy.ch/cgi-bin/nicezyme.pl? DBXREF ID source: MetaCyc DBXREF ID type: Pathway ID Description: ID of the pathway in which the S. cerevisiae protein participates. Corresponding URL: http://pathway.yeastgenome.org:8555/server.html DBXREF ID source: NCBI DBXREF ID type: DNA version ID Description: ID representing an NCBI (DDBJ/EMBL-bank/GenBank) accession number for the DNA sequence of an S. cereivsiae chromosomal feature. Corresponding URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide DBXREF ID source: NCBI DBXREF ID type: Gene ID Description: ID of the S. cerevisiae ORF at the NCBI Entrez Gene database. Corresponding URL: http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?db=gene DBXREF ID source: NCBI DBXREF ID type: Protein version ID Description: ID representing an NCBI (DDBJ/EMBL-bank/GenBank) accession number for the protein sequence of an S. cerevisiae ORF. Corresponding URL: http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?db=Protein DBXREF ID source: NCBI DBXREF ID type: RefSeq Accession Description: ID of the S. cerevisiae chromosome sequence in the NCBI RefSeq database. Corresponding URL: http://www.ncbi.nlm.nih.gov/RefSeq/ DBXREF ID source: NCBI DBXREF ID type: RefSeq protein version ID Description: ID of the S. cerevisiae protein sequence in the NCBI RefSeq database. Corresponding URL: http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?db=Protein DBXREF ID source: SIB DBXREF ID type: Swiss-Prot ID Description: ID representing the accession number at the Swiss-Prot protein database for a S. cereivsiae protein. Corresponding URL: http://us.expasy.org/sprot/ ============================================================================== SGD_CDS_xref.txt : This space-delimited file contains SGD xref update information for CDS entries in GenBank/EMBL/DDBJ and is being updated every two months. Columns are: 1) Accession Number 2) PROTEIN_ID of CDS 3) SGDID 4) Standard S. cerevisiae ORF Name ============================================================================== genetic_map.tab : Contains genetic mapping data submitted to SGD; this file is updated weekly (Saturday), though additions/changes to these data are rare. Columns are: 1) two point experiment name 2) parental ditype 3) nonparental ditype 4) tetratype 5) first division 6) second division 7) distance 8) standard error 9) interference 10) interference standard error 11) note 12) gene1 13) gene1 ORF name 14) gene1 chromosome 15) gene1 genetic position 16) gene1 sgdid 17) gene2 18) gene2 ORF name 19) gene2 chromosome 20) gene2 genetic position 21) gene2 sgdid ============================================================================== chromosome_length.tab : Columns are: 1) chromosome 2) NCBI RefSeq accession number 3) length Note that chromosome 17 is the mitochondrial chromosome. ============================================================================== ****************************************************************************** ****************************************************************************** ***** T H E F O L L O W I N G F I L E S A R E O B S O L E T E ****** ****************************************************************************** ****************************************************************************** ============================================================================== external_id.tab : This file is now obsolete and has been replaced with dbxref.tab, see below. Last updated August 28, 2004. September 2004: This file has been moved to the directory: data_download/obsolete/ ============================================================================== chromosomal_feature.tab : This file is now obsolete and has been replaced with SGD_features.tab, see below. Last updated August 28, 2004. September 2004: This file has been moved to the directory: data_download/obsolete/ ============================================================================== chromosomal_feature.last_week : This file is used by internal scripts and was just a copy of the chromosomal_feature.tab file. This file is no longer provided. ============================================================================== chromosomal_feature.previous_format : This file is the last version of the previous format of the chromosomal_feature.tab file. This format only contained the first 12 columns described above. This file was updated last updated at July 15, 2002. November 2003: This file was moved to the directory: data_download/obsolete/ ============================================================================== intron_exon.tab : September 2004: This file has been moved to the directory: data_download/obsolete/ ==============================================================================