A Bachelor Thesis Project, and a Certificate

Only a week ago I was decidedly in the doldrums because I still didn’t have anything even remotely resembling a realistic idea for a bachelor thesis. With my team manager having been absent for more than a fortnight things were at a complete impasse. Time was running away from me. I desperately wanted to complete the thesis before the end of the summer term, which means in August. By then we were already in the last days of February, and the allotted time for a thesis is six months.

In fact I was just about ready to give up the entire plan of writing the thesis at work. Since my betters seemed unable to come up with a good project I was starting to consider just going back to UAS and asking one of our professors for a subject for a thesis. True, that would have wrecked the cunning concept of getting paid for writing the thesis, but at the time that seemed a lesser price to pay than idly waiting for another couple of months and probably re-registering for an otherwise unneeded seventh term in fall.

Then last Monday my boss was finally back at work–not in the office, to be sure, rather with a customer, but his chat status showed him as a available. For a fortnight I had studiously refrained from pestering him with my problems (the man was officially sick, after all), but now I contacted him and asked for a meeting as soon as possible, to see, as I said, whether there was a chance for me to still write my thesis at work. The ill-advised slightly pissed-of tone was luckily lost on him. We had a very good meeting two days after. And were quickly agreed that the best idea for me would be to continue working on petrol retail price optimization, independent of the customer who had originally instigated our interest in that project but then gotten cold feet. Compared to the alternatives, such as optimizing our tiny in-house workflow system or setting up a process mining project at a similarly reluctant customer, a huge telecommuncation firm, it was by far the most realistic and concrete idea. And anyway, after a week or two of research and planning I was already quite deep into it.

I spent the afternoon writing an outline of the project. There are basically two major stumbling blocks, identifying and setting up a useful technology stack and getting either real or realistic data to work with. The rest is just standard machine learning stuff. Now both of those are tricky, the second more so than the first, and both could conceivably kill or seriously delay the project. But since it’s a bachelor thesis rather than an actual enterprise application my thinking is I will just turn overcoming those obstacles into an explicit element of the project, so the written thesis will be partly a discussion of how I managed to solve them and to which degree. That’s actually quite alright for an academic paper.

My team manager greenlighted the outline and the next day I met with my supervisor at UAS and she too approved readily. Truth told, I had picked her exactly because of her relaxed attitude. But then the outline is rather thought-out and I think it’s a good project. It’s not trivial, but manageable, I hope; and if it turns out it isn’t, there are ways of scaling it down and they’re already described in the outline. If I just approach this iteratively and solicit feedback from my two supervisors once a month or so, it should be quite easy to keep the project on track and finish in time.

On the technology front I met with the young colleague who administers our cloud server, to discuss my team manager’s idea of running the project in the cloud. It’s big data, afterall, and apparently hardly anybody is yet using our fancy new tool. The talk was a bit scary, because the administrator readily promised me my very own private Kubernetes cluster with plenty of disc space and memory, but said whatever I wanted to run on it I had better install on my own. Now I was and am thinking of using a Scala-based machine learning library on top of Apache Spark for this project, rather than the more obvious and simpler Python with its plenty of data science tools, both for performance reasons and to learn something new. But Spark support for Kubernetes is offially labeled as “experimental”, and I am not the most machine-savvy person on the block. In fact Kubernetes alone scares the hell out of me. I hope when I run into trouble they’ll help me out.

On the other hand, if this turns out to be a major problem, I can just skip it, downsize the project, if necessary, and try and run the machine learning locally on Python, as I did in two study projects at UAS. What’s not so easily solved, or avoided, is getting good data. Since the whole purpose of the thesis is to find a model for predicting the sales volume at different retail prices under a range of external factors, reasonably realistic data are crucial. And therein lies the rub, for the customer we originally had in mind is extremely paranoid about sharing their actual data. In fact it’s probably that, rather than the “supervision” issue, that made them back out.

In the last days I had several talks with colleagues who are working for this customer and with their data and they all agreed the chances are slim for me getting my hands on the real thing, in spite of non-disclosure agreements and my only using those data for creating the internal model. So the second best idea now is to somehow obfuscate the data so to make them meaningless for outsiders, or generate fake data based on the structure of the real data. Though just yesterday I read about “synthetic” data used in machine learning to create models in absence of real data. If you validate the model against your assumptions that go into creating the data, apparently this makes it apt for then transferring the learning to live data later. Or so the theory goes. If that’s true, then maybe I can just make up my own data, at least for purposes of the bachelor thesis. That’s where I am right now, and in fact I am starting to find it exciting. In any case, solving this problem will decidedly be one major part of the thesis.

So yeah, it’s going to be demanding and tricky and probably a lot of effort and trouble. But compared to where I was a week ago, at least now I have a project and I can start working on it. That feels such a lot better than just waiting. In fact I am quite looking forward to it now. And maybe I’ll still finish within the summer term. On the other hand, if I don’t, it really doesn’t matter either. It won’t be another full term, just a couple of months or so more. And while that means I’ll have to pay the tuitition fee for another term, come on, it’s really ridiculously low (though my cost-aware wife will still mind) and I’ll get free public transport for another six months in return …

For the past two weeks I had also regularly studied for the certification exam I finally took yesterday. That was a very strange experience. Around noon I went to the testing center in a downtown office area, which turned out to be run, in a veritable Russian doll model, by a local firm taking a license from the international testing center firm (among others) taking a license from one of the certifying bodies acting for the qualification board. You got that? They’re third tier sub-contractors. Huge office space, very young employees friendly but slightly out of their depth, and security beyond what you get boarding a commercial airliner–to prevent fraud, they made me roll up my pant legs and turn my pockets inside out, passed a metal detector over most of my body parts, and checked my glasses for hidden recording devices! Then I could finally sit down at a computer workplace and start with the test.

I am a conscientious person and I study hard. And even though the original idea had been to just take this exam as a windfall profit because the contents had been covered in our architecture lecture in the winter term anyway, I had, of course, studiously written and reviewed 144 flash cards to memorize the entire content of not one, but two official preparation books so to make double sure I would pass. It would simply have been too embarrassing to fail when my employer covered the exam fee (and as a result everyone at work would know). I went there reasonably sure of knowing everything they could possibly ask, and since I thought I knew the exam would be multiple choice I trusted on my common sense and passive knowledge of quite a few things more to at least guess well.

Yet the exam was unexpectedly tough. I have signed an agreement that prevents me from ever disclosing any of the actual questions on pain of sudden death by dismemberment (well, not quite, but you get the idea), but suffice it to say that most questions were not actually multiple choice but in the style of “pick the best three” out of seven or eight or worse, “check false or true” on six or seven statements, and then both would yield one or at best two points for the entire question if you got everything right. Go figure the number of possible combinations, and then keep in mind that a lot of the statements were decidedly vague, in some cases even undeniably ambigous. In at least one case you could, on honest reflection, justify picking true just as well as false, depending on how you chose to interpret a single adjective in the statement. And on that interpretation then hinges the single point for the entire question, even if you got all other three statements right. Finally, out of 39 questions, at least three covered things that had never even been mentioned in the official preparation books.

I had 75 minutes and was basically done with the bulk of the questions after 10, but then took another 30 to ponder every single checkbox I had ticked, just in case. When I finally, reluctantly, decided I could not possibly improve on my responses because there was very often simply no way of divining what the creators of the test had in mind, I was not sure of my replies to more than about half of the questions. That was really not what I had planned for or expected when spending about a week of net working hours on preparing for this single exam. With a very mixed feeling I hit the button for ending the exam. I wasn’t quite sure what to expect next, but after a moment of hesitation a new screen came up, saying I had passed with exactly 80 per cent of the questions answered correctly. Since 60 per cent suffice for passing, this was OK, but certainly not the comfortable margin I had worked for.

In truth, I would not grade the setup of this exam very highly. Since you never get feedback on your responses I have no way of knowing what saved the day, but my guess is I knew for sure at most 50 per cent and then guessed the rest as best I could, not because I didn’t know the answer, but because the wording of the possible responses was so awfully vague and ambiguous. Fortunately my guesses were right on a slight majority of the remaining questions, but I might just as well have guessed wrong some more and then the outcome would have been close. There must be ways of making this more accurate and less of a gamble. Start with wording the questions more carefully.

But anyway, I officially got a certificate now and can proudly (?) call myself, if I chose, “Certified Professional for Software Architecture (Foundation Level)”, for what’s it worth. On coming back to work I reported my success to my boss, who made me update my qualification profile and informed the editors of our in-house newsletter. As I said, consulting firms attach great importance to certificates. Or rather, I suppose, their customers do.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s