Revize 2d129043
Přidáno uživatelem Petr Hlaváč před asi 4 roky(ů)
modules/crawler/Pipeline.py | ||
---|---|---|
1 | 1 |
from Utilities import FolderProcessor, ConfigureFunctions |
2 | 2 |
from Utilities.Database import DatabaseLoader |
3 |
from Utilities.CSV import CSVutils |
|
3 | 4 |
|
4 | 5 |
import logging |
5 | 6 |
from datetime import date |
... | ... | |
96 | 97 |
logging.info(dataset_name + " has downloaded " + str(len(not_processed_files)) + " new files") |
97 | 98 |
|
98 | 99 |
for not_processed_file in not_processed_files: |
99 |
process_file_func(CRAWLED_DATA_PATH + dataset_path + not_processed_file) |
|
100 |
path = CRAWLED_DATA_PATH + dataset_path + not_processed_file |
|
101 |
date_dic = process_file_func(path) |
|
102 |
CSVutils.export_data_to_csv(path, date_dic) |
|
100 | 103 |
FolderProcessor.update_ignore_set(CRAWLED_DATA_PATH + dataset_path, not_processed_file) |
101 | 104 |
|
102 | 105 |
logging.info(dataset_name + " has processed " + str(len(not_processed_files)) + " newly crawled files") |
Také k dispozici: Unified diff
Re #7939
- upravena struktura processorů v pipeline
- pridani kontroly validity dat