Data sources » Historie » Revize 25
Revize 24 (Alex Konig, 2021-04-21 15:02) → Revize 25/36 (Alex Konig, 2021-04-21 15:05)
h1. Data sources
h2. ZČU open data
All ZČU data can be downloaded in formats xml, csv and json.
As discussed in further chapters there are certain complications with data sources not providing sufficient data granuality or amount. However there is a possibility that the data will in future contain more suitable datasets, and such should be at least acknowledged to some degree. However this is more of a topic for [[Prediction models]], where it will be further discussed. Further thorough the data standard university tags are used, however in some cases there is no source to find out what they mean (for example "parkoviště" or "STUD-PRA1") so we had to assume where they are.
To be able do display correct predictions we need to process this data in such a way that divides this data into data belonging to specific buildings. Those buildings are:
Buildings on campus:
* Fakulta strojní + ekonomická
* Fakulta designu a umění
* Fakulta aplikovaných věd
* Fakulta elektrotechnická
* Rektorát ZČU
* Menza
* Library
* CIV, ZV, UCV, IPC
* Univerzitní 14
Dorms:
* Koleje armabeton
* Koleje Bory
* Koleje Lochotín
* Koleje klatovská
Buildings in the city:
* Dominikánská 9
* Husova 11
* Chodské náměstí 1
* Jungmannova 1, 3
* Klatovská 51
* Kollárova 19
* Riegrova 11, 17
* Sady Pětatřicátníků 14, 16
* Sedláčkova 15, 19, 31, 38-40, Veleslavínova 27-29, 42
* TESLOVA 5, 9, 9a, 11 - objekty C, F, G, H v areálu VTP Plzeň
* Tylova 59
* Veleslavínova 42
Classroom prefixes can be divided in the following way:
|_. Building |_. Abbreviation |_. Room prefixes |
| Fakulta strojní + ekonomická | FST+FEK | UV, UU, UK, UL, UP, UF, UT |
| Fakulta designu a umění | FDU | LS |
| Fakulta aplikovaných věd | FAV | UN, UC, US |
| Fakulta elektrotechnická | FEL | EU, EK, EL, EP, ES, ET, EH, EZ |
| Rektorát ZČU | REK | UR |
| Menza | MENZA | - |
| Library | LIB | UB |
| CIV, ZV, UCV, IPC | CIV | UI |
| Univerzitní 14 | U14 | UT |
| Univerzitní 22 | U22 | UH, UD, UX |
| | | |
| Dominikánská 9 | D9 | DD |
| Husova 11 | H11 | HJ |
| Chodské náměstí 1 | CH1 | CH |
| Jungmannova 1, 3 | J1+3 | JJ |
| Klatovská 51 | K51 | KL |
| Kollárova 19 | K19 | KO |
| Riegrova 11, 17 | R11+17 | RJ, RS |
| Sady Pětatřicátníků 14, 16 | SP14+16 | PC, PS |
| Sedláčkova 15, 19, 31, 38-40, Veleslavínova 27-29, 42 | S15-40+V27-42 | SP, SD, ST, SO |
| TESLOVA 5, 9, 9a, 11 - objekty C, F, G, H v areálu VTP Plzeň | T5-11 | T, TF, TG, TH |
| Tylova 59 | T59 | TY, TS |
| | | |
| Koleje armabeton | KA | - |
| Koleje Bory | KB | - |
| Koleje Lochotín | KL | - |
| Koleje klatovská | KK | - |
For buildings and room abbrevations was used this source https://ps.zcu.cz/strediska/budovy-plzen.html
h3. Data timescale
All availible data was started to be collected at different dates, so therefore there is different amount for each dataset.
Jis data started to be recorded on 8.4.2018
Log-ins started to be recorded on 20.10.2011
Weather data started to be recorded on 30.4.2019
Since jis and log-in data seems to follow the same trends every recorded year we decided to go off of data we have availible weather data, so from 30.4.2019 forward.
h3. Historical weather data
Link to data: http://opendata.zcu.cz/Energeticky-dispecink.html
Data contains:
* datum_a_cas - date and time, time at which the values were measured with hour accuracy
* teplota - average temperature in given time slot (°C)
* vitr - average wind speed in given time slot (m/s)
* dest - value signifying rain (1) and no rain (0)
* svetelnost - average value of luminance (k lux)
For further processing luminance will be translated to the terms "sunny", "overcast" and "cloudy". In the 2019 data are values between 0 and 83.2k lux.
Lux values can be understood using the following table:
|_. Conditions |_. Value (lux) |
| Sunlight | 107527 |
| Full Daylight | 10752 |
| Overcast Day | 1075 |
| Very Dark Day | 107 |
| Twilight | 10.8 |
| Deep Twilight | 1.08 |
| Full Moon | 0.108 |
| Quarter Moon | 0.0108 |
| Starlight | 0.0011 |
| Overcast Night | 0.0001|
Source: https://www.engineeringtoolbox.com/light-level-rooms-d_708.html
However, upon comparing values in data with archived weather predictions it seems more like the following table would be appropriate:
|_. Conditions |_. Value (k lux) |
| Direct sungligt | >60 |
| Sunny | 40-60 |
| Overcast | 20-40 |
| Cloudy | 0-20 |
| Night | 0 |
Used weather archive: https://www.in-pocasi.cz/archiv/archiv.php?historie=2019-12-01®ion=9
More detailed data analysis in [[Weather at ZCU]]
h3. JIS data
Link to data: http://opendata.zcu.cz/Snimace-JIS.html
Data contains:
* datum_a_cas - timestamp of JIS authentication (accuracy in milliseconds)
* pocet_logu - number of authentized users in given time
* popis_objektu - description of object according to standard ZČU tagging
On the linked page there is written that " ... Data about dorms, the entry to laboratories and other spaces with restricted access, informations about university canteen, checkouts in univeristy library, access to copy machines etc can be interesting for students ...". However not all of these places can be found in said data. In data from 2019 are present only 46 different places, and most of them are dorms, parking lots and buffets.
There is a possibility that in the future the number of logged places will increase, however it is also possible that the data was affected by GDPR and more detailed data now won't be provided for the public anymore.
Possible solution is to assign provided spaces to buildings. More detailed data analysis of contained data in [[Jis activity - graphs]].
|_\3. Dorms and gyms |
|_. Dorm |_. Building |_. Location |
| A1, A2-Hlavni vchod, A3, A2 | KA | on Borská street |
| B3-LEVY, B3-LevyVytah, B3-PRAVY, B3-PravyVytah, B3 | KB | on Baarova street |
| M16, M14 | KB | on Máchova street |
| L1, L2, L1L2-vchod | KL | on Bolevecká street |
| L-Posilovna | KL | in Bolevecká dorm |
| KL-Posilovna, K1 | KK | on Klatovská street |
|_\3. Parking lots |
|_. Place |_. Building |_. Notes |
| Zavora-FEL | FEL | |
| Zavora-Kaplirova | ? | on Kaplířova street |
| US 005 - závora vjezd, US 005 - mříž vjezd | FAV | |
| Zavora-FDU | FDU | |
| Parkoviste-vjezd, Parkoviste-vyjezd | all on campus | |
| Zavora-NTIS-vjezd, Zavora-NTIS-vyjezd | FAV | |
| VC-VJEZD, VC-VYJEZD | S15-40+V27-42 | on Veleslavínova street |
| KolaBory-vnejsi, KolaBory-vnitrni | ? | |
| EXT/kola | FST+FEK | |
| EXT/kola-B | FAV | |
| B3-kolarna | KB | on Baarova street |
|_\3. Food courts |
|_. Name |_. Building |_. Notes |
| EP-BUFET | FEL | |
| NTIS-BUFET | FAV | |
| UV1-Bufet | FST+FEK | |
| MenzaKL-vydej | MENZA | |
| Menza4-kasa{x}| MENZA | x in range <1, 5> |
| Menza1-kasa-l, Menza1-kasa-p | MENZA | |
|_\3. Study rooms |
|_. Room |_. Building |_. Notes |
| STUD_VC53 | S15-40+V27-42| on Veleslavínova street |
| STUD_KL20, STUD_KL87 | K51 | on Klatovská street |
| STUD_PRA1 | SP14+16 | |
| STUD_UB113, STUD_UB211 | LIB | in the on campus library |
| STUD_ST407 | S15-40+V27-42 | |
h3. WebAuth data
Link to data: http://opendata.zcu.cz/Autentizacni-system.html
Data contains:
* datum - date of access
* budova - building tag
* hodina_zacatek - start of lecture
* hodina_konec - end of lecture
* pocet_prihlaseni - number of successfull sign-ins to given computer in given lecture
* stroj_hostname - name of specific computer
* typ_objektu - type of object (classroom, laboratory, lecture room, other)
* ucebna_nazev - specific name of room
* vyucovaci_hodina - number of lecture (according to the timetable)
On the linked page there is written that "... Signing in using orion login and password can also help track sign-ins to computers at ZČU and corresponding activity in computer laboratories ..." however it seems quesstionable if really all computer logins are in this data. Since it contains only 106 different rooms for all of ZČU in data from the year 2019, which seems suspicious especially since some rooms that we know that they are equipped with computers and are being used (at least sometimes) are not present.
So, it would be possible to again assign those rooms to the appropriate buldings using the table at the beggining of ZČU open data chapter and go off the assumption that a similar set of students will be attending lessons in the same building (which is often the case at least with KIV lectures).
More detailed data analysis in [[Login activity at ZCU - graphs]].
h3. Occupancy data
Link to data: http://opendata.zcu.cz/Obsazeni-mistnosti.html
Data contains:
* rok_platnosti - year
* budova - building tag
* ucebna_nazev - room name
* typ_objektu - type of room (učebna/laboratoř/posluchárna/jiné)
* kapacita_objektu - maximum capacity of room
* obsazeni - number of students enlisted
* predmet - abbreviation of timetable action
* typ_akce - type of lecture (seminář/přednáška/cvičení)
* vyucovaci_hodina - lesson number (according to the timetable)
* hodina_zacatek - lesson beggining
* hodina_konec - lesson end
* semestr - semester (Letní semestr/Zimní semestr)
* tyden - week (S(even), L(odd), K(every),J(other))
* tyden_v_roce - week in the year
* datum - date
It seems possible that not all lessons that are taught on ZČU are included in this data. Data from 2019+2020 contains only 1202 unique lesson instances.
Also there are some instances without assigned building and room name, however this shouldn't be an issue since lessons are usually looked up by their abbrevation, not by room.
How to work with lessons that are not included in these datasets is rather a topic either for [[Prediction models]] or handling user input.
h2. Weather data
Link to data: http://wttr.in/Plzen,czechia?format=j1
Data is in json file format and contains detailed weather prediction for Pilsen, CZ. For this application will be usefull mainly the following details:
Current weather:
* localObsDateTime - date and time
* cloudcover - amount of clouds (values in range <0-100>)
* weatherDesc - weather description
* temp_C - temperature (°C)
* precipMM - rainfall (mm)
* humidity - humidity (values in range <0, 100>)
* windspeedKmph - wind (km/h)
Prediction:
* avgtempC - average temperature (°C)
* date - date
further contains hourly prediction for following information
* WindGustKmph - wind (km/h)
* chanceofrain - chance of rain (0-100%)
* chanceofsnow - chance of snow (0-100%)
* cloudcover - amount of clouds (values in range <0-100>)
* humidity - humidity (values in range <0-100>)
In current data precipMM and in prediction chance of rain specifies rain value. From cloudcover can be estimated values such as sunny/overcast and cloudy.