Projekt

Obecné

Profil

Akce

MetadataParser

The MetadataParser class (contained in MetadataParser.php) is responsible for locating, extracting and parsing metadata information from the .csv files present in the underlying corpus release files AND transferring this information to the database. Following is the description of the most important public functions available in the class. Note that this is just a basic information - further documentation and explanation (including information about input parameters and return types) can be found in the relevant source file.

MetadataParser::transferToDB(String $corpusRelease)

Function used for transferring of all available metadata contained within summary.csv and content.csv files into the underlying database. The $corpusRelease parameter is the name of the folder holding the specific corpus release on the file system, without the QualitasCorpus- prefix (e.g. if the corpus release is within a folder named QualitasCorpus-20130901r, the parameter to this method should be 20130901r). In case any conflicting data is already present in the database, it is overwritten (i.e. information within the CSV files have precedence over the information in the database).

warning. Note: depending on the amount of data in the specified corpus release, this method might take a very long time to finish (~ tens of minutes)

MetadataParser::getCorpusMetadata(String $corpusRelease, integer $limit = 0, array[String] $requiredColumns = null)

Function returning a Metadata object (see Metadata.php), containing metadata extracted from the summary.csv file found in the metadata folder within the specified corpus release. This is only used by the transferToDB() function for now, but it might be useful to some other scripts in the future that might need the parsed metadata from the underlying CSV files.
The $limit parameter can be used to limit the number of rows returned from the file. The $requiredColumns parameter can be used so that only columns with the specified names will be returned. All columns are returned if this is null.

MetadataParser::getSystemMetadata(String $corpusRelease, String $sysver, integer $limit = 0, array[String] $requiredColumns = null)

Function returning a Metadata object (see Metadata.php), containing metadata extracted from the contents.csv file found in the metadata folder within the specified system version's folder in the given corpus release. This is similar to the getCorpusMetadata() function, except it extracts metadata from the contents.csv file of a specific system version. This is only used by the transferToDB() function for now, but it might be useful to some other scripts in the future that might need the parsed metadata from the underlying CSV files.
The $limit parameter can be used to limit the number of rows returned from the file. The $requiredColumns parameter can be used so that only columns with the specified names will be returned. All columns are returned if this is null.

MetadataParser::getCorpusMetadataColumns($corpusRelease)

Function used to parse and return an array of column names found in the summary.csv file of the specified corpus release. Note that this function relies on the fact that the column names are on the first row that doesn't begin with the character #. This is only used by the transferToDB() function for now, but it might be useful to some other scripts in the future that might need the parsed metadata from the underlying CSV files.

MetadataParser::getSystemMetadataColumns()

Function returning an array of column names used in the contents.csv files of system versions. Because the contents.csv files don't contain a "header" row with the column names (and parsing them from the comment lines would be difficult), this function simply returns a hard-coded array of column names specified in the config.php file of the parser (in the same folder). This function can later be modified in case the format of the contents.csv files changes in the future.

Aktualizováno uživatelem Martin Berka před více než 8 roky(ů) · 3 revizí