Projekt

Obecné

Profil

DatasetProcessing » Historie » Revize 8

Revize 7 (Petr Hlaváč, 2020-05-27 09:03) → Revize 8/10 (Petr Hlaváč, 2020-05-27 09:06)

h1. DatasetProcessing 

 Složka obsahuje implementace processoru pro jednotlivé datasety. Processory jsou dynamicky importovány je tedy proto nutné dodržet pojemnování *"dataset-name"_processor.py*. 

 Připravený date_dic naplně následovně 

 date_dict klíč -> datum ve formát YYYY-mm-dd-hh 
 date_dict hodnota -> data_dict (další dictionary) 
 data_dict klíč -> název zařízení 
 data_dict hodnota -> CSVUtils.CSVDataline 

 *při tvorbě CSVUtils.CSVDataline probíhá kontrola validity dat. 
 při exportu dat do CSV se následně kontroluje jestli objekty jsou provadu ze třídy CSVUtils.csv_data_line!!* CSVUtils.CSVDataline !!* 

 Po implementování metody je nutné změnit *Return None* na *Return date_dict* 

 h2. Generovaný Processor 

 <pre> 
 from Utilities.CSV import csv_data_line 

 def process_file(filename): 
     """ 
     Method that take path to crawled file and outputs date dictionary: 
     Date dictionary is a dictionary where keys are dates in format YYYY-mm-dd-hh (2018-04-08-15) ddmmYYYYhh (0804201815) 
     and value is dictionary where keys are devices (specified in configuration file) 
     and value is CSVDataLine.csv_data_line with device,date and occurrence 

     Args: 
     filename: name of processed file 

     Returns: 
     None if not implemented 
     date_dict when implemented 
     """ 
     date_dict = dict() 

     #with open(filename, "r") as file: 
     print("You must implements process_file method first!") 
     return None 

 </pre> 

 h2. Vzorově implementovaný processor 



 


 <pre> 
 from Utilities.CSV import csv_data_line 
 from Utilities import date_formating 


 def process_file(filename): 
     """ 
     Method that take path to crawled file and outputs date dictionary: 
     Date dictionary is a dictionary where keys are dates in format YYYY-mm-dd-hh (2018-04-08-15) ddmmYYYYhh (0804201815) 
     and value is dictionary where keys are devices (specified in configuration file) 
     and value is CSVDataLine.csv_data_line with device,date and occurrence 

     Args: 
     filename: name of processed file 

     Returns: 
     None if not implemented 
     date_dict when implemented 
     """ 
     date_dict = dict() 

     with open(filename, "r") as file: 

         for line in file: 

             array = line.split(";") 

             date = date_formating.date_time_formatter(array[0][1:-1]) 
             name = array[1][1:-1] 

             if date not in date_dict: 
                 date_dict[date] = dict() 

             if name in date_dict[date]: 
                 date_dict[date][name].occurrence += 1 
             else: 
                 date_dict[date][name] = csv_data_line.CSVDataLine(name, date, 1) 

     return date_dict 
 </pre>