Projekt

Obecné

Profil

« Předchozí | Další » 

Revize 04a2b5a4

Přidáno uživatelem Petr Hlaváč před asi 4 roky(ů)

Re #7939
- pridana dokumentace metod a trid
- korekce chyb v jmenech promenych
- pridani informaci pro vygenerovane skripty

Zobrazit rozdíly:

python-module/DatasetCrawler/KOLOBEZKYCrawler.py
1 1
from Utilities import FolderProcessor
2 2
from Utilities.Crawler import BasicCrawler
3 3

  
4
# Path to crawled data
5
CRAWLED_DATA_PATH = "CrawledData/"
4 6

  
5
def crawl(config):
6 7

  
8
def crawl(config):
9
    """
10
    Implement crawl method that downloads new data to path_for_files
11
    For keeping the project structure
12
    url , regex, and dataset_name from config
13
    You can use already implemented functions from Utilities/Crawler/BasicCrawlerFunctions.py
14

  
15
    Args:
16
        config: loaded configuration file of dataset
17
    """
7 18
    dataset_name = config["dataset-name"]
8 19
    url = config['url']
9 20
    regex = config['regex']
21
    path_for_files = CRAWLED_DATA_PATH + dataset_name + '/'
10 22

  
11 23
    first_level_links = BasicCrawler.get_all_links(url)
12 24
    filtered_first_level_links = BasicCrawler.filter_links(first_level_links, "^OD_ZCU")

Také k dispozici: Unified diff