DatasetProcessing » Historie » Verze 7
Petr Hlaváč, 2020-05-27 09:03
1 | 1 | Petr Hlaváč | h1. DatasetProcessing |
---|---|---|---|
2 | |||
3 | 4 | Petr Hlaváč | Složka obsahuje implementace processoru pro jednotlivé datasety. Processory jsou dynamicky importovány je tedy proto nutné dodržet pojemnování *"dataset-name"_processor.py*. |
4 | 1 | Petr Hlaváč | |
5 | 3 | Petr Hlaváč | Připravený date_dic naplně následovně |
6 | 1 | Petr Hlaváč | |
7 | 4 | Petr Hlaváč | date_dict klíč -> datum ve formát YYYY-mm-dd-hh |
8 | date_dict hodnota -> data_dict (další dictionary) |
||
9 | 3 | Petr Hlaváč | data_dict klíč -> název zařízení |
10 | data_dict hodnota -> CSVUtils.CSVDataline |
||
11 | |||
12 | *při tvorbě CSVUtils.CSVDataline probíhá kontrola validity dat. |
||
13 | při exportu dat do CSV se následně kontroluje jestli objekty jsou provadu ze třídy CSVUtils.CSVDataline !!* |
||
14 | |||
15 | Po implementování metody je nutné změnit *Return None* na *Return date_dict* |
||
16 | 1 | Petr Hlaváč | |
17 | h2. Generovaný Processor |
||
18 | |||
19 | <pre> |
||
20 | 5 | Petr Hlaváč | from Utilities.CSV import csv_data_line |
21 | 2 | Petr Hlaváč | |
22 | 1 | Petr Hlaváč | def process_file(filename): |
23 | """ |
||
24 | Method that take path to crawled file and outputs date dictionary: |
||
25 | Date dictionary is a dictionary where keys are dates in format ddmmYYYYhh (0804201815) |
||
26 | 5 | Petr Hlaváč | and value is dictionary where keys are devices (specified in configuration file) |
27 | and value is CSVDataLine.csv_data_line with device,date and occurrence |
||
28 | 1 | Petr Hlaváč | |
29 | Args: |
||
30 | filename: name of processed file |
||
31 | |||
32 | Returns: |
||
33 | 2 | Petr Hlaváč | None if not implemented |
34 | date_dict when implemented |
||
35 | 1 | Petr Hlaváč | """ |
36 | 2 | Petr Hlaváč | date_dict = dict() |
37 | |||
38 | 1 | Petr Hlaváč | #with open(filename, "r") as file: |
39 | print("You must implements process_file method first!") |
||
40 | 2 | Petr Hlaváč | return None |
41 | |||
42 | 1 | Petr Hlaváč | </pre> |
43 | 6 | Petr Hlaváč | |
44 | h2. Vzorově implementovaný processor |
||
45 | 7 | Petr Hlaváč | |
46 | |||
47 | <pre> |
||
48 | from Utilities.CSV import csv_data_line |
||
49 | from Utilities import date_formating |
||
50 | |||
51 | |||
52 | def process_file(filename): |
||
53 | """ |
||
54 | Method that take path to crawled file and outputs date dictionary: |
||
55 | Date dictionary is a dictionary where keys are dates in format ddmmYYYYhh (0804201815) |
||
56 | and value is dictionary where keys are devices (specified in configuration file) |
||
57 | and value is CSVDataLine.csv_data_line with device,date and occurrence |
||
58 | |||
59 | Args: |
||
60 | filename: name of processed file |
||
61 | |||
62 | Returns: |
||
63 | None if not implemented |
||
64 | date_dict when implemented |
||
65 | """ |
||
66 | date_dict = dict() |
||
67 | |||
68 | with open(filename, "r") as file: |
||
69 | |||
70 | for line in file: |
||
71 | |||
72 | array = line.split(";") |
||
73 | |||
74 | date = date_formating.date_time_formatter(array[0][1:-1]) |
||
75 | name = array[1][1:-1] |
||
76 | |||
77 | if date not in date_dict: |
||
78 | date_dict[date] = dict() |
||
79 | |||
80 | if name in date_dict[date]: |
||
81 | date_dict[date][name].occurrence += 1 |
||
82 | else: |
||
83 | date_dict[date][name] = csv_data_line.CSVDataLine(name, date, 1) |
||
84 | |||
85 | return date_dict |
||
86 | </pre> |