#26: Machine Learning and Data Science with Jesper Dramsch

Follow the show on Apple Podcasts, Spotify, Overcast, PocketCasts, Stitcher, Breaker, Castbox, Google Podcasts, Anchor, RadioPublic, or copy the master RSS to paste into your favorite player, or subscribe via e-mail here.

SCROLL DOWN FOR TRANSCRIPTION

On the season finale of the Software World, I welcome Jesper Dramsch, Scientist for Machine Learning in the European Center for Medium-Range Weather Forecasts (ECMRWF) and Geophysicist.

In our conversation, Jesper talks about the differences between Machine Learning and Data Science, how to enter the field, how life scientists shift their focus to Data Science.

We talk about how businesses should approach data as the processes and methodologies are completely different from software engineering.

Transcript

[00:00:00] Candost Dagdeviren: In the final episode of the season, my guest is Jesper Dramsch, a scientist for machine learning at the European Center for Medium Range Weather Forecasts. Jesper as a PhD in applied machine learning to geoscience and has a master's and bachelor's degree in geophysics.

[00:00:45] Before I had an interest in machine learning and data science, but could not pursue that dream. The field became one of my dropout endeavors in my life. However, I see many life scientists change their careers and get into machine learning like Jesper I asked them about the career change and difficulties. Because I see many people are pursuing the same dream. We talked about the differences between software engineering and data science in development processes and why data projects fail in companies.

[00:01:18] They gave their recommendations and tips for scientists looking to switch into tech industry but don't know where to start. We also talked about writing, online courses, and being a creator. Now it's time to listen to the season finale of Software World and learn.

[00:01:40] Welcome, Jesper when I looked at your personal website I only found one, usage of word artificial intelligence. So that was interesting for me because I mean, everyone is calling everything artificial intelligence for the last couple of years.

[00:01:57] Can we or maybe should we stop calling everything artificial intelligence?

[00:02:01] Jesper Dramsch: That's an interesting question. I've started using the term artificial intelligence a little bit more after working in industry. But before so when I, when I was doing my PhD and all that work, I was very, very allergic to, to the term artificial intelligence, because I feel just because you're applying a neural network to something that's just a bunch of numbers being multiplied with each other and has nothing to do with artificial intelligence actually.

[00:02:31] But people like to stretch their imagination and think that there's, it is something fancy that you do. But a lot of times we just apply linear regression to a problem. Right. And that has nothing to do with in independent thinking or any understanding of the problems. so, that's why, why I rarely use artificial intelligence. And I think that's a remnant from me being like more of a physicist that I applied these kinds of models, these kinds of things to, to actual problems. And there's no thinking involved and that's kind of the distinction that I make.

[00:03:13] What is your opinion? How do you see like the distinction between artificial intelligence and machine learning, data science, if that's sprung out to you, how do you see see that position to actually.

[00:03:26] Candost Dagdeviren: So for me most people call artificial intelligence I see them like all these projects and everything is statistics, not sometimes even machine learning. So applying just some statistical modeling and that's and people start calling it, Hey, we are using AI technologies and et cetera, which I'm really not liking those kinds of things.

[00:03:48] On the other side, I very much liked the idea or at least the development of machine learning because it's not new and it's been many, many years it's there. It's been still developing. And also, I kind of like the idea of like, I dunno, making it a hype because when you make something hype, then it becomes reality after awhile.

[00:04:11] That's why. Okay. Using artificial intelligence or mentioning it some time to time, but not everywhere. I mean, at least let's, let's be honest. Like for me, when you use machine learning or statistical modeling, just say, Hey, if you are using machine learning, that that should be enough. You shouldn't call everything artificial intelligence.

[00:04:32] Jesper Dramsch: That, that makes a lot of sense. Yeah.

[00:04:34] I agree. And I think like AI sells, so it's a great marketing term. And I think like, honestly, to most people, artificial intelligence and machine learning like most people actually don't know what either of them is, so you can use them interchangeably, but artificial intelligence is a little bit more emotive, but what it actually does, like most, most people don't know.

[00:05:00] So you can it's, it's not a technical term and I've seen machine learning be used non-technically a lot more since, since the hype started and it's now Yeah.

[00:05:14] Like everything is called machine learning as well. So we're kind of not, not at that technical distinction anymore, either.

[00:05:21] Candost Dagdeviren: The part for me is that data science. So that is more tricky because I cannot really differentiate, like what should I say data science. That is the part actually. I'm curious to learn from you a little bit more because yeah, I think machine learning is part of data science, but I really cannot draw a line or throw anything around data science to put a limit on it. So can you enlighten me on that one?

[00:05:51] Jesper Dramsch: I think machine learning is a technique used in data science. So if you, if you look at it

[00:05:58] I think data science is a, is a workflow. So essentially you get the data, you have to load the data, clean the data. Then you want to explore the data and find like patterns in it. And then you have to do some kind of modeling and machine learning sits well, and that modeling part, but it could also be classic statistical modeling or like causal modeling, whatever modeling you do with your data. Like if you do more of the customer thing, then you can apply some classic churn models, for example, and it doesn't have to be machine learning. You just have to model your data in some kind of, kind of way.

[00:06:37] And then you'll go on and you do the visualization and the reporting. So really you have this workflow that sits relatively established at this point, I'd say and machine learning fits in, in that part of the modeling part, but really nowhere else, like, of course you can use a little bit in the exploration in Oxford has this cool cool thing.

[00:07:01] It's a little library and Python where you use random forest to predict the other variables to kind of get like a non-linear cross correlation matrix. But that's kind of, kind of the extent of it, right. Using machine learning to explore using clustering. But I would like the, the series machine learning, I would mostly put in the modeling part.

[00:07:22] Also there's always the feedback loops and data science, right? You, if you find something in your exploration that doesn't fit, where you realize that your cleaning hasn't worked, you always have to go back. So this is also the machine learning modeling part can always feed back into your cleaning, into your feature generation, all that stuff.

[00:07:42] So I think that that makes it very complicated, right?.

[00:07:47] Candost Dagdeviren: Yeah. I mean, for me, I still cannot really grasp which part is more difficult than the other, because when I think about it, like cleaning the data. Putting a model like modeling and then getting the feedback and then reporting and everything around it. Like every part seems extremely complex. I mean, that might be related to my ignorance on the topic, but at least still, I mean, I have some knowledge, I have some knowledge and machine learning and I finished some courses and et cetera, but still for me, it's very difficult to apply these modelings to data.

[00:08:25] But from your perspective, which part is the most difficult part, like the, from cleaning to clean data or even gathering data to the reporting part from like this whole process?

[00:08:38] Jesper Dramsch: I honestly, I have to agree with you. All of them are really, really hard and I can't really say which one is the hardest, because if you don't gather your data right, you, you know how they say garbage in, garbage out. So if you spend thousands millions of dollars to collect data, and your data is either like biased like the data that we get before elections, where you call people and only the people that actually want to talk to you answer that data collection is incredibly biased and we're slowly realizing that, right. Or in physical data. When you go out on these super expensive ships in Marine and you realize, oh, I forgot to plug in this one cable.

[00:09:24] And that means I didn't have any power source on my, on my collection. So you're paying the entire crew, the ship, all that equipment. And you had an error in there, or you decide to only keep the wrong parts. So my favorite example is at CERN. So the large Hadron Collider where they only keep, I think a percent of the data that is actually generated and they have to decide beforehand which one for their specific experiment.

[00:09:51] And I mean, that happens everywhere. Like you always have to talk to experts, what data you keep. And it's critical because your data might be complete garbage afterwards. If you clean your data in the wrong way. Like if you remove certain outliers, you might not know about these extremely important events that might actually be indicators for black Swan events or something like that.

[00:10:15] But because you're, you're just doing the scrubbing and you're throwing out everything that has like a standard deviation over a two Sigma, your just saying, oh yeah, no, I'm getting rid of all of those. So those can be super dangerous. And in my understanding of it you have to have those whole system understanding or systems thinking that people talk about where you really have to understand from the very beginning to, to every step, what you're doing and what effect it has on the subsequent steps. So when, when you do your cleaning, you already know, I have to have in mind what kind of machine learning models you can use after that, in your feature generation also like if you're suddenly generating thousands and thousands of features, you can't apply some, some machine learning models because they don't work on these super wide tables essentially.

[00:11:11] And so it is super difficult and like, I always stop at machine learning, but then what kind of visualizations can you do to not lie to the people that you're, that you're reporting to and what kind of like what kind of dashboards can you actually build from this? And I gave a talk at PyData two days ago and they had a really interesting question because That person does these analysis and communicates the results.

[00:11:40] My talk was about communicating machine learning results, and sometimes they do an analysis on a subset of data. So like just like a part of the user, something like that, which we often have to do either the computers are too small or our data set isn't collected well enough yet. So we take a subset, but you have to put a lot of disclaimers with it because it only has statistical validity on these users and users like them.

[00:12:08] But this often falls under falls under the, under the carpet a little bit. When you then show it to management or marketing, where they're like, oh, these are amazing results. And then the disclaimers take an hour to Tylee because people don't understand. Disclaimer actually means sometimes it's also cultural in companies where they just run with results and don't really take caveats.

[00:12:32] I've seen that like outside of data science and machine learning before where you build this model, this model only works on this one specific thing. And then it's applied to everything like a year after. It can be endemic to how we treat problems, how we treat solutions. And I think there has to be a communication shift in that, but Yeah. generally also really hard and really important. So that's a really long-winded way. I'm sorry to say, I completely agree with you that all of these parts are critical and all of these parts are very hard. And I think the only way to really get it is with having an understanding of the entire system that you want.

[00:13:15] Candost Dagdeviren: Yeah. So one thing. About the CERN and like leaving out 98% of something, the data out, I talked with a scientist working at CERN, it was purely coincidental. Like I was in mobile applications, developer conference, and they invited physicists from CERN, which was like, I found it weird at that time, but now I'm really glad that I had the chance.

[00:13:43] And that person mentioned, they decide somehow. I think using again, machine learning to decide which data they are going to keep, which they, the data they are going to drop out because there are like terabytes of data is just produce from running one experiment. And there is even like, they have no storage for that and they have no processing power to handle that much data.

[00:14:09] When I think about all those things, it's still puzzling me, like, have you can use machine learning to decide which data you're going to keep, which data you're going to leave out. Like, do you have any idea or perspective on that?

[00:14:26] Jesper Dramsch: Only tangentially. So I heard the same thing and I read a little bit about it. The thing with the machine learning that is used, that is for very quick decisions, right? Because if you, if you have a random random forests or a decision tree or something like that, you get these kinds of decisions very fast because they're computed very fast.

[00:14:49] As far as I understand it you basically have to model the so we have a very good understanding of The physics of particles already. So what Cerner is searching for is for irregularities in those. So when they were looking for the Higgs Boson they they already knew which frequency range they hard to look for this particular experiments, so they could like take everything that is outside of that frequency space and just discarded because we already have such a good understanding of what's normal.

[00:15:24] We know where to look for irregularities for anomalies and. I, I know some of the machine learning is used for modeling for making modeling faster. I think the, I don't know if they're still using it, but they have published some research on GANs for, for modeling how the, how the rays are essentially going.

[00:15:45] It's, it's really interesting. And I, I find it fascinating as well because you're using machine learning before the actual analysis. So you have to be very, very sure that your machine learning doesn't do anything weird. Right. It has to be like super solid. And that's fascinating that you have, like, a lot of people don't have that kind of trust in machine learning systems.

[00:16:08] And I would also be very careful too, to be like, oh Yeah, this works because it has to be like bulletproof, right.

[00:16:16] Candost Dagdeviren: Yeah. I mean, we are deciding at that time, like your algorithm is deciding, and usually these algorithms are like a black box and you rarely know what's going on inside. And this was the, for me, the really puzzling part, like how can you be so sure that you're not leaving something out that you need at the end?

[00:16:35] And also like the, the person admitted that they are living out sometimes, but that's the cost of running these experiments and that's the cost they're making, but they are aware of it at least. That was really mind opening for me . And when I think about like this academical experiments and like academia and business, so I'm more onto the business side.

[00:16:58] Of course, I'm, I'm I just have a bachelor of science degree in computer engineering, and I don't have any masters or doctoral research education and the thing. So I see a lot of people from academic backgrounds are joining to business side more and more, and especially for data science and everything around it.

[00:17:21] And I have friends who are currently getting a PhD for example, and they have the background in molecular biology and genetics, and they are, they, I have some friends who are pharmacists and people like you like geophysicists or I think geophysicist and everything from other related fields, getting into data science.

[00:17:41] And for me, why these people are looking, getting into data science. I really don't understand. I see that, like, there are some overlaps between what they are doing, how they are working. Why data science attract these people? What's the catch here?

[00:17:59] Jesper Dramsch: I think there are multiple reasons. So one very easy reason is that the salaries are really good.

[00:18:06] Candost Dagdeviren: That should be it.

[00:18:08] Yeah.

[00:18:09] Jesper Dramsch: it's, it's hard to be to salary. And like, even if you're, if you're working at deliveroo or something like that, you're getting a pay increase of 20,000 pounds in the UK over an academic, a good academic salary.

[00:18:25] Right. The position I'm in right now, I'm getting almost twice that of someone in classic academia. And I'm very happy with that, to be honest, because it's like, it's a very good, good livable wage. And in academia, like, especially in the UK, the salaries are so low. I, I don't know, like most people do it out of passion.

[00:18:53] So that brings me to the next point where. I, myself, I felt a little bit disillusioned after a little bit of a while. That is in part because if I had kept working in geophysics, my path would have led me into island gas and honestly, generations that kinda kind of showed me that I could just maybe not do that and try to apply my, my brain to something positive in the world.

[00:19:22] And now I'm working in weather and climate modeling, which is amazing. I'm still very proud of that. And I feel like a lot of other people are feeling the same. So academic systems are very set up to have a strong hierarchy and you will mostly be working with Men that have been in those positions for decades, have no intention of leaving and you have to be very lucky and have no work-life balance to make professor in a lot of, in a lot of ways.

[00:19:55] And then of course, you're also, you're working on, you're never working on geophysics. You're working on this subset of subsets of subsets. You're working on this super small thing. And some people slowly realize that the work they're doing barely has a real world impact, like maybe three other people care about the exact thing you are working with.

[00:20:19] And I feel like, especially in my generation, I'm a millennial and in the generation that, and the gen Z, they're looking for, for having an impact with what they do. So after working 10 years in that field you're like, okay, I think I'm kind of done. I know everything there is. I can only make like incremental solutions to this super small thing.

[00:20:42] I'm ready to work on something that, that actually matters of something that affects people, something that maybe even that has a positive impact. So I feel like, Yeah,

[00:20:55] salary is on the one hand. And on the other hand, it's, it's having an impact on society and doing something that is interesting and it's also different.

[00:21:05] And so on the, on the side, how they're changing, I personally think data scientists can really benefit from, from this kind of extraneous education. So if you're in biology, in geoscience, you've touched dirty messy, real life data. This data that fits at no table, this data that is just like every data point feels like an outlier because it's, again, something really, really messy.

[00:21:37] And that uniquely positions you in a, in a way that you have already worked with real-life data. You've done this statistical analysis in some kind of way. Like most people have worked with a principal component analysis PCA, and don't even know that it belongs to the machine learning class essentially, but biologists have to use a geologists, have to use it to make sense of their data because it's so high dimensional.

[00:22:05] And then they come to data science and they're like, oh wow. All my data fits into tables. Awesome. So they learn a little bit of R, a little bit of Python learn some statistics like you will never get away without any statistics, But most of the statistics you need for data science is fairly intuitive.

[00:22:22] You're like, okay, I have this data set and I have to split it into a training data set and a validation data set. So I can actually test my hypothesis on data that a model has never seen. And that just makes sense. Right. And most of the things in data science kind of are intuitive. They all have like statistical tails and you can read about like central limit theorem and all the stuff that is behind it, but you don't necessarily have to, sometimes you can just apply scikit-learn to a thing and be happy with it.

[00:22:53] So that's one of the thing they can change. So why they want and why they can change is very intertwined in a way.

[00:23:00] Candost Dagdeviren: But how does this change look like then? Like for example, you are, you're an academic and then you want to change the business side. And what I see in, or at least observe is business side has more coding stuff. Like really, I don't know, debugging things coding while academia has different phase, which I'm not very familiar with.

[00:23:24] What's this change in the workday, let's say how your Workday changes.

[00:23:30] Jesper Dramsch: It's, it's very interesting because you're, you're definitely right with this, I think. So first of all, as an academic year of much more freedoms, like no one cares if you're in the office or not. And that's very different than business, like in business it's nine to five, if you're lucky, it's like flex time.

[00:23:46] But that definitely changes. And then from the application standpoint, you have to work with others. Academic academia is solitary work. It's extremely lonely at every point, you're mostly doing your own thing. So that mostly means that people have like a collection of scripts and they only have to work on their machine.

[00:24:12] Like there, there are initiatives like the SSI, the software sustainability Institute, and like the RSU, the research software engineers. They're trying to make this a little bit less pronounced and I think that's a really good, good way to go. But in the end in academia, you have never deployed a model to an API you have never had to share your code. Of course, that changes a little bit and like exceptions to the rule. But a lot of people get away with just working in Jupiter notebooks, having no documentation, no comments, no nothing. Barely using git if at all. So really don't know these software practices, but I mean, you can learn those in a month, like writing better documentation, learning git, and all that stuff. It's essentially like a short course, like a crash course, a bootcamp to get your started. I think that's fair enough. The rest of you can kind of learn on the job to be honest, like interacting with others and code reviews and all that stuff. You get it the first time you do it, you look over someone's shoulder, especially if you're doing pair programming, which can be so valuable.

[00:25:25] Right? You see that, how people come and code, you see how people write doc strings, you, you find the tool to like in vs code, you have the Orthodox string generator and it just fills out everything for you. And you just have to write 3 sentences of what your function is actually doing. So, Yeah.

[00:25:43] I don't think it's that hard to transition.

[00:25:46] The hard part is to get a job because you essentially have to translate your academic experience into something business people understand, but yeah.

[00:25:55] How do you see that actually?

[00:25:57] Candost Dagdeviren: So for me, it's like, when I look at my friends and my friends who are looking into data science positions in the industry, I see them struggling to first figure out how they can get into. As you said, like this is a difficult part to getting into the business side industry experience, because this part from academic perspective, I think, as you said that like academic life is so lonely that I see many times people failing to translate their experiences into the things that they can explain in the interview.

[00:26:37] They often think their experience are not related. They have the imposter syndrome. Many times I have seen that and they are doing amazing things. Really. Like sometimes I like my mind is blown away when I hear they are talking, what they are doing is like, how can you do these things? And how can you not mention these things in an interview?

[00:27:04] These things for me, the difficult part, because getting this mindset switch from. Saying, Hey, I'm I'm a lonely the person who is working in really, really small thing in the world, which as you said, like only tree people care in the world, really. And then translating this into a text or a communication that you can explain to the industry experts like what you are doing, which metrics are you using?

[00:27:33] This is the part that I see many people struggle. And when they do this, the rest is usually, I don't know. It follows as you learn how you work in a company because many companies have different ways of working and you don't sometimes even have to learn git because companies using different thing, like for that one, the rest is usually easy part.

[00:27:59] And yet people see that the rest is difficult part, but for me, from outside perspective, I see that the having the first mindset. The difficult part rest is just after you sign the contract, then you are going to learn. Even if you know right now you're going to relearn them because the point is, you never know how one company works, how do they call how they do code reviews?

[00:28:25] It's so different from any company to another, how they approach the pair programming, how they approach the everything documentation and everything. It's so different from one company to another. This is why I think the main difference is in this first application mindset, like applying to a job. That's, that's the part I see it.

[00:28:46] Jesper Dramsch: To riff on that, I think most companies also know who they're hiring. They don't hire you if you're not fit for the job. They know it's going to take you out, like I always say a month, but a month is very short. Like I have been in my current position for three months now I'm slowly getting the hang of it, but everyone's like, hey, this is really hard.

[00:29:10] We're doing like, we're doing world class modeling here. If you would understand it in a month, this would actually be more worrisome than you taking the time to actually get up to speed and companies know what kind of work they do and how hard it is. And the manager that is hiring you is aware that you're coming from outside.

[00:29:30] And usually they completely understand that it will take you first of all, two weeks to get into the computer systems because you have to like, get all your security set up and get a desk and everything if you're in office or whatever. And then it's going to take you at least a month to learn all the intricacies, usually more.

[00:29:50] I think they they're very aware. And if they're not, maybe it's not that great of a company, you know, greater resignation and all,

[00:30:00] Candost Dagdeviren: Yeah, okay, I have to ask what would be your one single advice for the people who are in this switching time, or if they are looking into getting into data science from academia

[00:30:15] Jesper Dramsch: advice pertaining to what exactly like for making that mind shift switch,

[00:30:20] Candost Dagdeviren: like one useful advice that you think that will be extremely beneficial. Of course, every case is individual. We cannot identify what will be the general advice that you can give.

[00:30:32] Jesper Dramsch: Look at the things you've done your experience and make them digestible in a business sense. So when I write my CV, I I've started writing it with like an action word, like delivered produced something that is like active and is nice for recruiters. And they want to read that sentence because you're active.

[00:30:53] And then you want the thing that you did with the tool that you did. So delivered a mobile app with react backend and then some kind of number like this number can be fairly made up. Like don't don't lie, but like estimate I know this is really hard for a PhDs I, I, myself, like nothing I've done has real world impact, but.

[00:31:19] Deliver this, this, this, and talk to three conferences is a number that you can say, which is an academic output, but this gives like this gives some kind of anchor, what kind of impact your work has done. If you ever can put a dollar amount on anything you did, you should because business loves dollars. But if you can name any kind of number pertaining to what you've done there, it's already great because it kind of gives people a sense of what you've done. But if you, if you are very clear about translating your experience and like every single detail you can, you can have your PhD position.

[00:32:01] And then you're like, I worked on this, this, this, this, and be very like, Yeah.

[00:32:06] like I said, this use this formula of, of, with the thing with the tool and try to match those tools to what the company is looking for. Then you're golden. Then, then you have a written this thing. And it also helps you actually talk about the thing because you have to think about it.

[00:32:24] Right. That's why I write so much. It clarifies my thoughts. Yeah, I think that's the advice.

[00:32:30] Candost Dagdeviren: Yeah, when like this companies are looking for numbers and looking for the impact that you had on the business or on the people directly is, is the great perspective that many people also miss. But I want to roll back a little bit to the companies that the processes that you were talking about, the machine learning and also the data science process, many companies have data scientists inside and use these like data science, machine learning, AI and all the other buzzwords around that we can count them as wrong.

[00:33:05] But anyway I often see that successful tech companies usually successful tech companies fail while they are working with data. This is so common. And for me, it's still a bit I don't like the word weird, but I'm going to use it here because I don't see these things matching, like software engineering is different from data scientist process.

[00:33:29] And yet when I take a look at the companies, I see them, they are in the organization side as well. They are behaving like, or they are acting. Those are things are, say, so why these things happen? Like why is it so difficult for companies to have this successful data analysis and reporting processes?

[00:33:51] Jesper Dramsch: I think there are two things that play here. So one is a management issue. The risk profile for a data science project is completely different to a risk profile of a software project. You know, software project, you know, the thing you want to do, and you just have to find the way to do the thing. In data science, you know the thing you want to do, but it's not even clear this as possible. So it's, it's a very different management style. So if you have product product or project managers that are used to software projects, they might not know how to properly manage risk in data science and how to assign work packages to the individual things.

[00:34:39] And I, I think one of the main issues there is that you have to build failure into real risk risk tolerance, which is of course the catastrophe, but it's much more likely than in a, in a software project, because if you, if you want to build like your mobile app or something in a certain way, you can certainly do that.

[00:35:03] It just takes time. And that is the bigger risk. And on the, on the other hand So I, I don't have that much business experience, but to me it feels a lot of time people expect results much faster than as possible. Data science is a fairly long process. And especially if you're used to agile or scrum three weeks sprints, you can't do a data science project at that time.

[00:35:31] And you can't produce results in that time because it might, you might need to go back going back to the comment about data science, being iterative. You might realize that you misprocessed your your data sets. So it, in my experience it usually takes months longer than people estimate the process to take.

[00:35:55] And for a lot of companies that work agile, that is a failure already. So those kinds of things are at play like proper risk management and then not giving it the time that it actually needs because real-world data is really hard to work with and very messy. And usually you have at least two pitfalls you haven't thought of.

[00:36:15] And yeah, oftentimes it already starts at the, at the ingestion of the data. Oftentimes it starts with a hardware. Like I have definitely worked with people that have wanted a fancy neural network and didn't have a GPU cluster and that's kinda like you or I had multiple conversations with Product managers that we're like, okay, we'll do this, this, this, this.

[00:36:42] And then I sat there and I'm like, so when do we have to time to actually label the data? Like when, when are we doing like, and what kind of budget do we have for this? And caught them completely off guard because they thought theirs would just magically appear and So, yeah.

[00:36:58] the, as for, for us practitioners, it's really important to set expectations there and set them really well and give yourself more time than you think you need.

[00:37:10] You need to build slack in because there are going to be pitfalls and things that you take for granted. Aren't going to be there because companies often don't know yet how to do the thing that you want to do. So they don't have the infrastructure. They don't have the data available. I worked at someplace where you could only get a processed data.

[00:37:33] The raw data was on tape storages in a salt mine in Texas.

[00:37:38] Candost Dagdeviren: Wow.

[00:37:39] Jesper Dramsch: So things like that, right? You will encounter things that you have never even thought of. So I think those are two failure modes. Do you see others there?

[00:37:52] Candost Dagdeviren: I'm I'm just stuck in the having data records in tapes and a salt mine. Yeah. I see usually the organizational side, like, as you said, if you, if people are working with agile on the software and having this scrum sprints and et cetera, this is a part where I see the usual failure happens and mistakes happens.

[00:38:18] But I have seen also some companies that uses like longer periods of rhythms or cycles, not just two week sprints but longer sprints, let's say it this way, but still they have a lot of problems. Even you put, I don't know, three months sprint, two months. It's still a lot of problem. And I think there are not so many data scientists managers in the field.

[00:38:45] I mean, many companies start having data scientist division, data science division, and led by some software engineer who is maybe good in infra infrastructure, just because they can create infrastructure for the data scientists and et cetera, or I don't know some unrelated manager or leader often maybe a bit product managers.

[00:39:11] I'm not sure, but these are the parts that I see most of the time that's struggling. I was struggling companies to achieve better results. And even though they get some results, I don't think they are learning a lot from what mistakes they did, because the problem is, as you said, in data science, there's a feedback loop, which I learned from you today that you need to go back to the beginning and maybe look at your data, how you interpret it.

[00:39:42] Maybe use another model and do everything from scratch, basically. And if you think about in business terms, this is a full failure, like literally full failure because you are redoing everything. And so I I'm a team lead right now. And if my team is working in that way, I'm a software engineer. And I have this mindset when I think about this is a failure.

[00:40:06] And if I approach the same way, then for me, it's already failed projects and I can blame anyone. Of course. I don't like blaming people. I like to blame processes and how everything works together, but still, it will be very difficult for me as a leader to explain this, even to executives, because they have no idea about data science.

[00:40:32] They, they want to get some reports. They want to see the results like, Hey, what's the expected revenue. For example, even with the most common revenue, projections or modelings and et cetera, what are we expecting in six months? And if this is wrong, then this is also a big problem, like for the business as well.

[00:40:56] Like how can you do the financial modeling? I mean, of course there are a lot of companies and even there right now, there are a lot of really good models that you can use for financial modeling, which is right now, it should be fairly easy. But at the same time you use user data to create models and et cetera, to really project how it's gonna happen.

[00:41:17] This financial modeling, it's still a bit troubling. I would say these are the parts that I see the most mistakes or problems happen.

[00:41:26] Jesper Dramsch: I think it's also important to see that there's a little bit of selection bias. People really love to talk about the failures, especially if they expected things to fail and other companies that are just were just. It just works. Like, what are you going to talk about? Like, oh yeah, like we made a model and like now, now it's in production and it's kinda, I feel like you, you hear more about the failures as well, whereas companies where you, where you have like working deep learning models, where you they're just quietly like working, working with those models, whereas companies where they're tried it once and then had this catastrophic failure with it, they love to go on, on talks and everything where they're like, and like, sit, sit at the table and talk about how, how bad it works.

[00:42:21] And I've seen a lot of companies where they have like working data infrastructure and working machine learning models. Their data first company is essentially so it maybe a different mindset, maybe a different problem, but Yeah, I get what you're saying. I think it's, yeah, it, it's really hard to, to not see these big failures and be like, Ooh, maybe we shouldn't try it.

[00:42:46] But I think most companies that have any kind of dynamic system in there are going to eventually be surpassed by companies that managed to harness the data. Because if you figure out how to, you can, you can make it work faster and more efficient and really get everything out that is, is within the data and which is within the reality of your data set.

[00:43:16] Candost Dagdeviren: I want to search gears at a little bit here and go to the part where we talked about people getting into the field. And I know you have two online courses that you teach data science and business analytics to more than 2000 students overall, which is pretty impressive. I'm really curious about why did you develop this course and how does it help people?

[00:43:42] Jesper Dramsch: The why is super easy Skillshare approached me and asked me if I want to do a course. And I was like, yes, I do. I do like money. That's that's how that cause happened. Then I sat down and I had a look at what do other people do in their courses? What do I think? And then I synthesized everything like, because Skillshare is quite limited and Skillshare is a video course platform.

[00:44:09] So it's essentially like a big YouTube playlist. And then at the end, your, you can do a project if you want. So you have to teach everything on video, which makes it very difficult because I have a couple of, I, I find a very boring to just watch people code and this would essentially be it, just a code screen and just see, this is a function to clean your data.

[00:44:36] This is how you load your data, blah, blah, blah, blah. And I would find that incredibly tedious. So what I, what I did instead is like bookended with like talking head and try to make it more engaging. And I think in part, it worked for what it is, because the reality of these kind of online courses that they have a horrendous completion rate.

[00:45:01] So most online courses have a completion rate, I think around five to 10%. The one trick that gets people to complete them are certificates, but even the Coursera courses, horrendous completion rate and cohort courses. So live courses, they really work at the moment. I maybe people burn out on them as well, but they're also much more involved.

[00:45:24] Right. So, yeah. I sat down and I had a look at some, some of the teaching materials from Harvard, I think. They were talking about this data science flow. Then I did the IB IBM data science certification. I did that before I think. Yeah. And so I, I had like a grasp of what the other experts think.

[00:45:49] And then created like a curriculum around it made it very applied. So my idea was, I don't want to teach people Python. I expect people to already know Python and really just dive into it, give commentary of how I do things, why I do things and have like the juiciest information in there off like someone that has touched data before.

[00:46:13] So you can actually, and it doesn't have a lot of theory because you can learn that theory from people that are much better than me. I have never taken a statistics course in my life. Geophysics doesn't have any statistics curriculum. It's all deterministic. So, everything there is self-taught so clearly people are going to be better at it.

[00:46:33] Also like all my statistics intuition is in a different language. Like you, you learn about like normal distributions as a kid, kinder, you learn about coin flips as a kid kinda. And I learned all that in German. So all my intuition about like the, I mentioned it before the central limit theory, I had to read that up because someone on LinkedIn was being condescending involved that, and I had to know what it is.

[00:47:00] I know it a German, I don't know it in English. So all that stuff, I'm like, I'm going to just make it from a practitioner's perspective. So people learn how to actually do it. And it's a lot of coding, a lot of watching the code things. And how do you use Jupiter and yeah, I think I missed the second question you asked.

[00:47:21] Candost Dagdeviren: How does it have the people, but I think you already explained that part. And I think you already explained both parts by did you develop this course and how, how does it help people? So for me as fair.

[00:47:33] Jesper Dramsch: And the second course is quite different. And actually the second course has no coding and at all at all, it's a no code data science master class. And I just talk about how to think about data science and data science project, because these were positioned in the Skillshare for teams section where it's essentially the business as Skillshare, they have switched a little bit with a thing and go more for creative classes again, but these two still exist in there.

[00:48:03] And it's essentially also like for managers to understand how, how does it work executives to understand what exactly entails a data science project. And of course, for beginners who don't know Python yet to just get like a feel how, so really the people that don't want to go hands-on also, it's much shorter, which really helps my, the coding classes, I think four hours.

[00:48:29] And that one is one hour and much more digestible. I have like a lot of like visual aids and, and stuff in there. So yeah. Two different audiences. And interestingly, the coding one is much more popular despite being longer.

[00:48:46] Candost Dagdeviren: I have one other question regarding your Kaggle achievement, you rated 81 between more than 100,000 people. How did it happen and why was that important?

[00:49:02] Jesper Dramsch: So I want to make a very important caveat and I put that everywhere, but it's a little bit obscure to people that aren't on Kaggle, which probably benefits me. But yeah, it's the I, I was top 81 in the notebook division. So Kaggle has different, they have the competitions, but they also have like datasets discussions and all that stuff.

[00:49:23] And on the teaching part essentially you can get uploads on your notebook and gain like ranks and there. And yeah, basically it started with a geophysics competition that was organized with a seismic company that worked with that data and wanted to predict something on that. And I did my master's thesis on that topic.

[00:49:46] And I was like, Ooh, interesting. And they usually on Kaggle, they come with a starter notebook. And for some reason, the company didn't do that. They kind of have the plan to maybe do it sometime, but I, I couldn't sleep. I was on LinkedIn. I saw the announcement and I was like, Ooh, interesting. I had a look at it.

[00:50:09] I spent the night writing up, essentially an intro for two geophysics for machine learning practitioners and two machine learning for for geophysicists and geoscientists and kind of gave like the caveats of working with, with the data and that notebook took off. I think that it has been viewed over 70,000 times at this at this point it's extremely highly rated.

[00:50:34] I think it has like over 800 uploads, something like that, which is like, it doesn't sound like a lot, but only up votes by people that have already ranked on Kaggle are counted at that. And yeah, like it's really up there. And then I, I was like, oh, this is really cool because I'm, I like writing. And normally I like to think that I'm relatively good at getting information across fairly accessible. So I did the same thing then on medical problem, which was almost applying the same model, but I was cheap that I copied it over and I'm like, oh, this works awesome. And I talked about what the actual problem is. And these kinds of getting people started from both sides that worked a couple of times for me.

[00:51:22] And it worked so well that once I ranked, which was really, really late I immediately shot up to 81 because all, I didn't have a lot of notebooks, but the notebooks that I had were so highly regarded that they put me like right on top, of course now it's degrading. I haven't done it in a, in a while, but yeah, generally that was kind of my, my thing.

[00:51:47] And now of course I can say that I ranked top 81 on the Kaggle notebook division, which is fairly, fairly impressive, at least at that point, especially since I didn't do it with these they are like voting rings and everything. Now that that upvote each other and then it goes viral and then it goes up and some people use Kaggle as their blogs.

[00:52:09] So when you aggregate other information and that you get a lot of upvotes, so now it's a different time. But I'm still very proud of that achievement, to be honest. And I, I think it was a really good learning experience for me as well, to make this information digestible in, in this kind of way in one single notebook.

[00:52:31] So that's kind of how that happened. I, I was sleepless at night. I found it, I thought, okay, I can write something cool here and kind of got my start with that.

[00:52:41] Candost Dagdeviren: Okay. I've got to scratch my itch and ask you like how do you do all this stuff? Like when I take a look at your portfolio and there are around 20 publications, you wrote a book, which we didn't even mention here, you have online courses, you are active on creating YouTube videos. You have a newsletter, you said you like writing and you already wrote around 200 articles.

[00:53:05] You contribute to the open source projects to like scikit-learn and TensorFlow, which talk to most favorite ones. And you have many other things around, like, what is your secret to have these many successful projects?

[00:53:19] Jesper Dramsch: I probably have ADHD it's I don't have a diagnosis, but Tik TOK says I have ADHD and it's probably true. No, so one of the things is all, these are very tangible projects, so. Once you have a contribution to scikit-learn, you have contributed to scikit-learn. You can go on, you can become a core contributor, but that's much more work.

[00:53:43] So I did a couple of entry level contributions there. I improve that documentation on several things and I was like, okay, this is cool. And then you know how to do it so you can do it on other projects as well. With a writing, writing as a habit. If you start writing, you will get better at writing.

[00:54:02] And it's self-fulfilling right now, I'm doing an online course where you write an article a day and it gets so much easy. I wrote one right before our, our podcast episode here. And it's, it gets so much easier because you're like, okay, I'm gonna just take this idea and run with it. And I didn't realize that either that it was so easy to just keep going then with all the other things, I kind of just do the things I find interesting. So I. I'm very happy with abandoning projects when I, I have no interest in them anymore. So I've done a lot of things, but I've also abandoned a lot of things. There are like apps I have running somewhere in the internet that still are there. Still people could find them. And that's also the important thing with this.

[00:54:52] If you want to be a person that can be found everywhere, you have to like leave things that are there and they don't have to be perfect either. Like I didn't become top one on Kaggle. I am not a core contributor to, to TensorFlow. I don't have a viral app. I don't. And like I only wrote book chapters and like a super small ebook that I'm giving away with my newsletter.

[00:55:16] So that's also, I didn't write a book. Books are very hard to write. They take 10 thousands of words, and so much care and writing a book chapter in an existing publication as much easier as well. Yet you have written a book chapter, which I know that is impressive, but then I was also able to use this and use the book tap as an introduction to my PhD.

[00:55:43] So I was always able to, to kind of use things in different contexts as well. Like the Kaggle thing was awesome and I was really happy with it, but I also, I have no interest in pushing that higher because pushing that higher as in like those numbers, like I'm competing with people who are, who are paid to work on Kaggle at this point.

[00:56:06] And so really you, you have to choose your battles and you have to be happy to publish things that are not perfect. My Skillshare cause is probably not perfect and it's not for everyone, but it is there, and people seem to enjoy it for what it is. It's a course, it gets you through, it has all the material with it.

[00:56:28] And now I know how to make a video course I'm actually, so I'm in the process of recording my third one right now, which is more applied a little bit shorter. But Yeah.

[00:56:37] I, I do so many things because in part I'm very easily distracted and just tend to do those things. And then I'm also happy with publishing them when they're not in the super sparkly final state, because I realize people really don't care that much about super sparkly, final state.

[00:56:57] They're okay. With an 80% solution, 90% solution like the. I deployed a couple of machine learning apps, and none of them will work on mobile, which is terrible. Like most of the world is mobile these days. And everyone, like, if you were in a business, you would have to make them work on mobile. But because it's just my side projects, I don't have to, I can just put a banner on top, on mobile, like this doesn't work with mobile and be like, yeah, this is fine.

[00:57:25] Like fixing, it will take a week's work worth of work. I think that's how

[00:57:31] Candost Dagdeviren: You're saying like creating, wanting, and then just spreading over into different fields, like a little bit changing. I like the idea. I really like it. I think I also think that sometimes staying as an amateur and not being fully professionally in one thing is also good because, I was reading, I think Steven Pressfield on that one.

[00:57:54] Jesper Dramsch: The art of war?

[00:57:55] Candost Dagdeviren: Yeah, exactly, exactly. It's an amazing book, but yeah, all the things that we are doing, I'm writing as well, which is really, really shaping my thoughts. And I think I started thinking better, not only thinking, but also speaking better. That's helps me a lot. And all those things, sometimes I think I also personally have to accept that I I'm okay with staying amateur in some levels.

[00:58:28] I don't have to be a professional in everywhere, which helps in my mental health a lot, because I don't know if I have an ADHD or not, but I have definitely thing that pushes me to take one more step every time to improve it, make it better. Like, Hey, I have to make this perfect. I contemplate a lot on different things, but at the same time, I try to like, have a bias for action.

[00:58:54] And then just say, just leave it as it is. You don't need to make it perfect because many people already think that this is better than you think. Like, I have some articles, which I found a little bit weird that many people shared and read thousands of people. And I was like, I mean, this article is far from perfect.

[00:59:18] How you like this so much? Like, I really don't understand it, but it's, it still happens. I think there's very have to live some amateur sprit.

[00:59:31] Jesper Dramsch: This is such a good point that you made with a bias for action. I in the beginning of my PhD, I had a lot of ideas, but I was also very aware from like a, I love self-development stuff. I read a lot of stuff. I listened to podcasts and like entrepreneurship and that, and I believe ideas are cheap.

[00:59:52] Everyone had the idea for an iPod I'm sure, but there's exactly one company and one like designer team that made the iPod and was successful with it and ran with it. And the same thing is true for every idea that you see online, where you say, oh, I have that idea as well. Yes you did. And that's awesome.

[01:00:16] But they, whoever it is, whichever idea it is, they delivered on that idea. And that's also the hard part, like bringing an idea to fruition and making it work is so much work. During, during my PhD, I, I realized I, I had this idea for three years and the interesting part is I wasn't quiet about that idea. I was talking about this idea for three years and no one was able to make it work.

[01:00:47] I was in the end and it was very hard, but like people could have stolen that idea that it was a good idea, but no one has the time, the no one has the knowledge, no one has the context for it. So Yeah.

[01:01:01] like if, if you have an idea, share it, it'll come back. And if someone else actually steals it, like that's a one in a million chance also then the, then the execution wasn't that hard probably.

[01:01:14] And you should have probably like worked on that earlier. So that's kind of, I don't know. That's also why I started YouTube, for example. When you write, it's very easy to, for people to steal your writing, I've seen it a lot online especially on LinkedIn, people like to copy your posts with, with blogs, not that much, but people like go, go in and like Yeah.

[01:01:36] people, people steal a lot, especially on influencers on social media with video that's much harder because it has my face at that.

[01:01:45] So I started like doing, doing, like talking head videos and stuff like that, just because I think it's not that much of a step up if you're already writing a blog post that is essentially a script for a YouTube video. Yeah.

[01:01:59] So I started doing that a bit.

[01:02:02] Candost Dagdeviren: I've seen people's stealing YouTube videos and making presentations or using the extremely, like literally same content in on their channels recorded by themselves. So this is like, I, I I've seen this, so believe me, when people are really willing to steal some things. They find ways to do it.

[01:02:22] Jesper Dramsch: That's also very true. Yeah.

[01:02:26] Candost Dagdeviren: Thanks a lot. I think we can close it here slowly. I've really, really enjoyed our conversation and I learned a ton about different things like creating videos or being a creator. And at the same time from data science, machine learning, many things. Thanks a lot. Jesper, here. I really enjoyed it.

[01:02:48] Jesper Dramsch: Thank you so much for having me. I had a lot of fun.

[01:02:50] Candost Dagdeviren: Well, I hope you enjoyed the season finale. I will take a break in the podcast and come back with a new season in a better format and exciting topics.

[01:03:01] Before you go. Don't forget to share the episode with one of your friends or colleagues. To get the updates about the new season, don't forget to subscribe to the newsletter in candost.blog/podcast. And about everything we talked here, you can find all the related links in candost.blog/podcast. Until next time, take care!

Podcast

Nov 23, 2021
Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.