DatasetProcessing » Historie » Verze 8
Petr Hlaváč, 2020-05-27 09:06
1 | 1 | Petr Hlaváč | h1. DatasetProcessing |
---|---|---|---|
2 | |||
3 | 4 | Petr Hlaváč | Složka obsahuje implementace processoru pro jednotlivé datasety. Processory jsou dynamicky importovány je tedy proto nutné dodržet pojemnování *"dataset-name"_processor.py*. |
4 | 1 | Petr Hlaváč | |
5 | 3 | Petr Hlaváč | Připravený date_dic naplně následovně |
6 | 1 | Petr Hlaváč | |
7 | 4 | Petr Hlaváč | date_dict klíč -> datum ve formát YYYY-mm-dd-hh |
8 | date_dict hodnota -> data_dict (další dictionary) |
||
9 | 3 | Petr Hlaváč | data_dict klíč -> název zařízení |
10 | data_dict hodnota -> CSVUtils.CSVDataline |
||
11 | |||
12 | *při tvorbě CSVUtils.CSVDataline probíhá kontrola validity dat. |
||
13 | 8 | Petr Hlaváč | při exportu dat do CSV se následně kontroluje jestli objekty jsou provadu ze třídy CSVUtils.csv_data_line!!* |
14 | 3 | Petr Hlaváč | |
15 | Po implementování metody je nutné změnit *Return None* na *Return date_dict* |
||
16 | 1 | Petr Hlaváč | |
17 | h2. Generovaný Processor |
||
18 | |||
19 | <pre> |
||
20 | 5 | Petr Hlaváč | from Utilities.CSV import csv_data_line |
21 | 2 | Petr Hlaváč | |
22 | 1 | Petr Hlaváč | def process_file(filename): |
23 | """ |
||
24 | Method that take path to crawled file and outputs date dictionary: |
||
25 | 8 | Petr Hlaváč | Date dictionary is a dictionary where keys are dates in format YYYY-mm-dd-hh (2018-04-08-15) |
26 | 5 | Petr Hlaváč | and value is dictionary where keys are devices (specified in configuration file) |
27 | and value is CSVDataLine.csv_data_line with device,date and occurrence |
||
28 | 1 | Petr Hlaváč | |
29 | Args: |
||
30 | filename: name of processed file |
||
31 | |||
32 | Returns: |
||
33 | 2 | Petr Hlaváč | None if not implemented |
34 | date_dict when implemented |
||
35 | 1 | Petr Hlaváč | """ |
36 | 2 | Petr Hlaváč | date_dict = dict() |
37 | |||
38 | 1 | Petr Hlaváč | #with open(filename, "r") as file: |
39 | print("You must implements process_file method first!") |
||
40 | 2 | Petr Hlaváč | return None |
41 | |||
42 | 1 | Petr Hlaváč | </pre> |
43 | 6 | Petr Hlaváč | |
44 | h2. Vzorově implementovaný processor |
||
45 | 1 | Petr Hlaváč | |
46 | 7 | Petr Hlaváč | |
47 | 8 | Petr Hlaváč | |
48 | 7 | Petr Hlaváč | <pre> |
49 | from Utilities.CSV import csv_data_line |
||
50 | from Utilities import date_formating |
||
51 | |||
52 | |||
53 | 1 | Petr Hlaváč | def process_file(filename): |
54 | 7 | Petr Hlaváč | """ |
55 | Method that take path to crawled file and outputs date dictionary: |
||
56 | 8 | Petr Hlaváč | Date dictionary is a dictionary where keys are dates in format YYYY-mm-dd-hh (2018-04-08-15) |
57 | 7 | Petr Hlaváč | and value is dictionary where keys are devices (specified in configuration file) |
58 | and value is CSVDataLine.csv_data_line with device,date and occurrence |
||
59 | |||
60 | Args: |
||
61 | filename: name of processed file |
||
62 | |||
63 | Returns: |
||
64 | None if not implemented |
||
65 | date_dict when implemented |
||
66 | """ |
||
67 | date_dict = dict() |
||
68 | |||
69 | with open(filename, "r") as file: |
||
70 | |||
71 | for line in file: |
||
72 | |||
73 | array = line.split(";") |
||
74 | |||
75 | date = date_formating.date_time_formatter(array[0][1:-1]) |
||
76 | name = array[1][1:-1] |
||
77 | |||
78 | if date not in date_dict: |
||
79 | date_dict[date] = dict() |
||
80 | |||
81 | if name in date_dict[date]: |
||
82 | date_dict[date][name].occurrence += 1 |
||
83 | else: |
||
84 | date_dict[date][name] = csv_data_line.CSVDataLine(name, date, 1) |
||
85 | |||
86 | return date_dict |
||
87 | </pre> |