Projekt

Obecné

Profil

« Předchozí | Další » 

Revize 2d129043

Přidáno uživatelem Petr Hlaváč před asi 4 roky(ů)

Re #7939
- upravena struktura processorů v pipeline
- pridani kontroly validity dat

Zobrazit rozdíly:

modules/crawler/Pipeline.py
1 1
from Utilities import FolderProcessor, ConfigureFunctions
2 2
from Utilities.Database import DatabaseLoader
3
from Utilities.CSV import CSVutils
3 4

  
4 5
import logging
5 6
from datetime import date
......
96 97
    logging.info(dataset_name + " has downloaded " + str(len(not_processed_files)) + " new files")
97 98

  
98 99
    for not_processed_file in not_processed_files:
99
        process_file_func(CRAWLED_DATA_PATH + dataset_path + not_processed_file)
100
        path = CRAWLED_DATA_PATH + dataset_path + not_processed_file
101
        date_dic = process_file_func(path)
102
        CSVutils.export_data_to_csv(path, date_dic)
100 103
        FolderProcessor.update_ignore_set(CRAWLED_DATA_PATH + dataset_path, not_processed_file)
101 104

  
102 105
    logging.info(dataset_name + " has processed " + str(len(not_processed_files)) + " newly crawled files")

Také k dispozici: Unified diff