Projekt

Obecné

Profil

« Předchozí | Další » 

Revize 04a2b5a4

Přidáno uživatelem Petr Hlaváč před asi 4 roky(ů)

Re #7939
- pridana dokumentace metod a trid
- korekce chyb v jmenech promenych
- pridani informaci pro vygenerovane skripty

Zobrazit rozdíly:

python-module/DatasetCrawler/JISCrawler.py
1 1
from Utilities import FolderProcessor
2 2
from Utilities.Crawler import BasicCrawler
3 3

  
4
# Path to crawled data
5
CRAWLED_DATA_PATH = "CrawledData/"
4 6

  
5
def crawl(config):
6 7

  
8
def crawl(config):
9
    """
10
    Implement crawl method that downloads new data to path_for_files
11
    For keeping the project structure
12
    url , regex, and dataset_name from config
13
    You can use already implemented functions from Utilities/Crawler/BasicCrawlerFunctions.py
14

  
15
    Args:
16
        config: loaded configuration file of dataset
17
    """
7 18
    dataset_name = config["dataset-name"]
8 19
    url = config['url']
9 20
    regex = config['regex']
21
    path_for_files = CRAWLED_DATA_PATH + dataset_name + '/'
10 22

  
11 23
    first_level_links = BasicCrawler.get_all_links(url)
12 24
    filtered_first_level_links = BasicCrawler.filter_links(first_level_links, "^OD_ZCU")
......
24 36
            files.append(file_link)
25 37

  
26 38
    for file in files:
27
        BasicCrawler.download_file_from_url(file, "CrawledData/" + dataset_name + "/", dataset_name)
39
        BasicCrawler.download_file_from_url(file, path_for_files, dataset_name)
28 40

  
29
    FolderProcessor.unzip_all_csv_zip_files_in_folder("CrawledData/" + dataset_name + "/")
41
    FolderProcessor.unzip_all_csv_zip_files_in_folder(path_for_files)

Také k dispozici: Unified diff