Six days before the summer term commences–our final term in this course of studies, for those that are left. Yesterday the hated scramble for the practicum groups took place once more, and despite the campus system’s acting up again, with the screen freezing several times, this time my practicum partner and I both got into our favorite groups in both courses.
Fact is, there’s almost always one group that’s obviously more attractive than the others. If it’s three groups that meet in consecutive weeks, as with most regular lectures, the second group is by universal agreement the best choice. For the first group are the guinea pigs. They get the least time for working on each new assignment when it comes out, and often the professor’s expectations are adjusted, usually in our favour, after the first group has submitted their solutions. The third group, on the other hand, has the final meeting uncomfortably close to the exams, so that often deep into the exam preparation phase they’re still working on the final practicum assigmment and don’t know yet whether they will get the PVL.
With a “compulsory choice” module the setup is often, and this time too, instead a full day every week, with one morning in two weeks for the lecture and the other three half-days within a fortnight for the three practicum groups. In this setup, evidently, the most attractive group is the one meeting in the afternoon after the lecture, for then next week you have the full day off.
Given that, I am quite surprised not only the two of us, but practically everybody else we know got into exactly those two most attractive groups. In the IT security practicum basically all of our remaining core group is in the second practicum group. But there we’ll be working in pairs, as is usually the case in the courses offered by professors from the technical CS program, so it really makes no difference, other than it will feel good to be one last time among the people we know. There’s only four of our semester group, however, in our “compulsory choice” module “certified tester”, as those modules are attended by students from different semesters. Here we are supposed to work in groups of three to four students, and that’s usually a lot less comfortable than working in pairs. A lot more overhead and friction, and much more leeway for things to go wrong. And the other two from our group who are in that course are two our our “dual” students whom we don’t know all that well. In fact, they have just been a way for a full term, doing a six-month internship in their firm, so they’re a bit of an unknown quantity. Which makes for some unease.
But all in all I think we are now set up for a more or less bearable last term. True to form for this course of studies, yet again both professors are new for us, but at least not new in a general sense, in fact quite to the contrary: they’ve been at UAS virtually forever. Which means that, at least this time, we won’t be the guinea pigs, and there will be plenty of Altklausuren.
Speaking of Klausuren (exams), we also finally got the remaining grades for those in the winter term, distributed systems already a couple of weeks ago, but process mining just yesterday. That was a long wait–almost seven weeks! Apparently distributed systems was graded very kindly, which is often the case when an exam is taken by a lot of people who didn’t take it in previous terms in spite of having attended the lecture, or took it and failed. Nearly 70 people took the exam this time, and it seems everyone passed, which necessarily means the grades were rather cheap. Small wonder practically everyone in our core group got an A or A+, the latter including me. I don’t know anything at all about process mining yet, other than my own grade, but since that exam didn’t go very well (for one thing it was way too long), I suppose my getting an A+ nevertheless must mean the professor lowered her expectations dramatically when grading the exam.
I do hope that’s a promising sign for her grading my bachelor’s thesis in the summer. Though right now ever getting there seems extremely remote. You remember the petrol retailer, a customer of my employer, who was originally interested in my doing machine-learning based price optimization for them? The colleague who is in charge of that contact will make one more try and see whether they will maybe let me have very old real data, but it’s a long shot. So I’ll most likely end up inventing my own data–at least the crucial sales figures without which I won’t have a predicted variable and hence can’t use machine learning. But never mind, I already have a pretty good idea for a model there. In theory.
But that’s numbers and business logic and some creative simulation. I’m good at that. No worries. It’s going to be fun.
What scares the hell out of me is the technological side. Part of the project is figuring out a setup for doing the number crunching in our company’s cloud. While a few million real numbers, at best (retail prices, crude oil prices, stock market indicators, geographical information such as population density or average income in the vicinity, weather data, traffic, and so on) is not exactly Big Data, training a neural network with this amount of input, at least in even halfway competitive time, is likely to overtax my laptop. And besides, if we ever want to do this for real cloud computing will be a necessity. Quite apart from its being the obvious future. In fact, this, rather than retail price optimization, is probably what makes this project worthwhile for my employer–getting some expertise on this new field that by all accounts is likely to gain a lot of prominence in the next few years.
In theory I sympathize. In practice, my being the one who has to acquire this expertise means I’ve already spent several days wading disorientedly in confusing and at times contradictory articles, tutorials, and forum posts trying to figure out how to do machine learning in the cloud–particularly your own cloud, rather than on a big cloud provider’s platform. Apparently it’s all possible, but since right now it’s the bleeding edge kind of thing, there aren’t any really introductory tutorials–everything there is is written by practitioners for practitioners. I am royally confused right now. So yes, you can run Apache Spark (a number-crunching framework used for machine learning tasks) in Kubernetes (a container framework for the cloud), but the feature is officially “experimental”. And where to put the data? You need a database, evidently. Which kind? Where does it run? Also in containers? There seem to be three ways of achieving that, none of which makes much sense to me, and the fourth position is “don’t do it, ever”. Or on the server, but not in a container? And where does the application itself run? How does it find Spark? How does Spark find the data? In spite of reading for days and being reasonably quick on the uptake when reading, I am still perfectly clueless as to all those vital conceptual questions. And now consider I’m really just a student who so far hasn’t used Kubernetes much at all, and only in a team with other people who knew what they were doing. Good luck!
Suddenly six months (in fact, five and a half) no longer seem so long to figure this all out. Particularly not with the term restarting in a few days.