Projekt

Obecné

Profil

« Předchozí | Další » 

Revize 1187e871

Přidáno uživatelem Petr Hlaváč před asi 4 roky(ů)

Re #7966
- Vytvoreny pomocne skripty pro spravu datasetu

Zobrazit rozdíly:

python-module/Utilities/Crawler/BasicCrawlerFunctions.py
5 5

  
6 6
# Path to crawler logs
7 7
CRAWLER_LOGS_PATH = "CrawlerLogs/"
8
# Path to crawled data
9
CRAWLED_DATA_PATH = "CrawledData/"
8 10

  
9 11

  
10 12
def get_all_links(url):
......
98 100
    url_parts = url.split("/")
99 101
    file_name = url_parts[len(url_parts)-1]
100 102

  
101
    path = CRAWLER_LOGS_PATH + dataset_name + '/'
103
    log_path = CRAWLER_LOGS_PATH + dataset_name + '/'
104
    data_path = CRAWLED_DATA_PATH + dataset_name + '/'
102 105

  
103 106
    # download file chunk by chunk so we can download large files
104
    with open(path + file_name, "wb") as file:
107
    with open(data_path + file_name, "wb") as file:
105 108
        for chunk in r.iter_content(chunk_size=1024):
106 109

  
107 110
            # writing one chunk at a time to file

Také k dispozici: Unified diff