Projekt

Obecné

Profil

Akce

Database structure documentation

Author: Martin Berka

This document describes tables and columns of the underlying database for Qualitas Corpus.

{TODO} add mg_acl_* tables

  • corpus - Contains all present corpus releases.
    • corpus_id - unique numeric identifier of given corpus release
    • corpus_release_name - "human" (and filesystem) name of given corpus release (example: 20130901e)
  • corpus_systems - Intersection table assigning systems to corpus releases.
    • corpus_id - ID of the corpus release
    • system_id - ID of the system present in the corpus release
    • sysvercount - number of versions of given system in the corpus release
  • corpus_system_versions - Intersection table assigning system versions to corpus releases.
    • corpus_id - ID of the corpus release
    • system_version_id - ID of the system version present in the corpus release
    • distribution - Which distribution(s) of the corpus does this system version appear in: 'e' - evolution, 'r' - recent, 'f' - full. Extracted from the corpus metadata file during import.
  • country - Holds all countries, usable for user's registration.
    • country_id - unique numeric identifier of the country
    • name - official name of the country.
    • iso - ISO code of the country. See ISO 3166-1.
  • java_version - Holds all java versions used in metadata of present systems.
    • java_version_id - unique numeric identifier of the java version
    • major - major version of the java version
    • minor - minor version of the java version
    • update - update version of the java version
    • architecture - specific architecture of the java version, if specified
  • license - Holds all licenses used in metadata of present systems.
    • license_id - unique numeric identifier of the license
    • name - Either name or description of the license. Taken from summary.csv
    • url - URL to full license. Set to NULL by default, when imported from summary.csv.
  • system - Contains all present systems.
    • system_id - unique numeric identifier of the system
    • description - a short description of the system
    • status - an indication of the development status of the system (for example: active)
  • system_version - Contains all present versions of existing systems and their metadata parsed from summary.csv.
    • system_version_id - unique numeric identifier of the system version
    • system_id - a foreign key to `system`.`system_id`
    • sysver - A unique identifier for the version of the system that is in the corpus. This identifier follows the naming conventions. Taken from summary.csv
    • fullname - The full name of the system the version belongs to. This is often the system attribute, but can be different when the full name is too awkward to use as the system identifier. This is not a system level attribute to accommodate systems that change names. Taken from summary.csv
    • domain - An indication of the purpose of the system. Taken from summary.csv
    • java_version_id - A foreign key to `java_version`.`java_version_id`. The earliest JRE version that the system version depends on.
    • license_id - a foreign key to `license`.`license_id`
    • license_file - Path to license file, relative to system version folder. Taken from summary.csv
    • released_date - The release date of the sysver. This date is determined ideally by a notice in the distribution or on the system website, but sometimes has to be guessed by looking at dates of files in the distribution. Taken from summary.csv
    • sourcepackages - Records the decision we have made regarding what is in a system (and not, for example, third-party library types). This is a space-separated list of prefixes of packages of Java types. Taken from summary.csv
    • n_bin - The number of types found in the contents details metadata for all types that are in the source packages, and that appear in the bin. This is considered the definitive set of types for the system, as this is what is actually deployed. Taken from summary.csv
    • n_both - The number of types found in the contents details metadata for all types that are in the source packages, and that appear in both src and bin.
      This is another means to measure the number of types in the system. Taken from summary.csv
    • n_files - The number of source files found in the contents details metadata for all types that are in the source packages, and that appear in the src. Taken from summary.csv
    • n_top(bin) - The number of top-level types found in the contents details metadata for all types that are in the source packages, and that appear in the bin. Taken from summary.csv
    • loc(both) - The sum of the LOC values from the contents details metadata for all types that are in the source packages, and that appear in both src and bin. Taken from summary.csv
    • ncloc(both) - The sum of the NCLOC values from the contents details metadata for all types that are in the source packages, and that appear in both src and bin. Taken from summary.csv
    • url - A web site for the version. Typically this is the system's home page, but as this may change over time this is kept at the version level. Taken from summary.csv
  • system_version_metadata - Contains metadata of specific system versions, parsed from their contents.csv files.
    • system_version_metadata_id - unique numeric identifier of the metadata
    • system_version_id - a foreign key to `system_version`.`system_version_id`
    • type_name - Fully-qualified type name, e.g. org.apache.commons.logging.impl.AvalonLogger. Taken from contents.csv
    • loc_bin - Location in bin. Taken from contents.csv
    • loc_src - Location in src. Taken from contents.csv
    • in_src - 0=in src pkg (user-defined), 1=not in src pkg. See 'Source Packages'. Taken from contents.csv
    • bin_src - 0=both bin and src, 1=bin only, 2=src only. Taken from contents.csv
    • distributed - 0=distributed, 1 = not distributed. Source code in 'Distributed'. Taken from contents.csv
    • level - 0=top-level public, 4=top-level non-public, 1=top-level different name, 2=nested, 3=probably haven't seen source, -1 no source, or not parsed. Taken from contents.csv
    • loc - Physical lines in file (for public type only). Taken from contents.csv
    • ncloc - Non-commented lines of code in file (for public type only). Taken from contents.csv
  • tag - Holds existing (unique) tags.
    • tag_id - unique numeric identified of the tag
    • tag - the tag itself
  • tag_systems - Intersection table assigning tags to specific system versions.
    • tag_id - a foreign key to `tag`.`tag_id`
    • system_id - a foreign key to `system`.`system_id`, if this tag belongs to a specific system
    • system_version_id - a foreign key to `system_version`.`system_version_id`, if this tag belongs to a specific system version
    • user_id - a foreign key to the table of users, specifying the author of the tag assignment to this system/system version {TODO} update reference when mg_acl_* tables are added to this document

Aktualizováno uživatelem Martin Berka před více než 8 roky(ů) · 7 revizí