Revize 04a2b5a4
Přidáno uživatelem Petr Hlaváč před asi 4 roky(ů)
python-module/DatasetCrawler/WIFICrawler.py | ||
---|---|---|
1 | 1 |
from Utilities import FolderProcessor |
2 | 2 |
from Utilities.Crawler import BasicCrawler |
3 | 3 |
|
4 |
# Path to crawled data |
|
5 |
CRAWLED_DATA_PATH = "CrawledData/" |
|
4 | 6 |
|
5 |
def crawl(config): |
|
6 | 7 |
|
8 |
def crawl(config): |
|
9 |
""" |
|
10 |
Implement crawl method that downloads new data to path_for_files |
|
11 |
For keeping the project structure |
|
12 |
url , regex, and dataset_name from config |
|
13 |
You can use already implemented functions from Utilities/Crawler/BasicCrawlerFunctions.py |
|
14 |
|
|
15 |
Args: |
|
16 |
config: loaded configuration file of dataset |
|
17 |
""" |
|
7 | 18 |
dataset_name = config["dataset-name"] |
8 | 19 |
url = config['url'] |
9 | 20 |
regex = config['regex'] |
21 |
path_for_files = CRAWLED_DATA_PATH + dataset_name + '/' |
|
10 | 22 |
|
11 | 23 |
first_level_links = BasicCrawler.get_all_links(url) |
12 | 24 |
filtered_first_level_links = BasicCrawler.filter_links(first_level_links, "^OD_ZCU") |
... | ... | |
24 | 36 |
files.append(file_link) |
25 | 37 |
|
26 | 38 |
for file in files: |
27 |
BasicCrawler.download_file_from_url(file, "CrawledData/" + dataset_name + "/", dataset_name)
|
|
39 |
BasicCrawler.download_file_from_url(file, path_for_files, dataset_name)
|
|
28 | 40 |
|
29 |
FolderProcessor.unzip_all_csv_zip_files_in_folder("CrawledData/" + dataset_name + "/") |
|
41 |
FolderProcessor.unzip_all_csv_zip_files_in_folder(path_for_files) |
Také k dispozici: Unified diff
Re #7939
- pridana dokumentace metod a trid
- korekce chyb v jmenech promenych
- pridani informaci pro vygenerovane skripty