The Production-First Mindset

Learn how the top brands wow customers through production-first engineering. On this podcast you will find the tactics, methodologies, and metrics used to drive customer value by the engineering leaders actually doing it. Join Rookout CTO, Liran Haimovitch as he explores how customer-centric brands approach engineering to create a competitive advantage; with interviews covering topics such as automation, issue resolution, team structure, DevOps, and more.

XM Cyber's Tamar Stern - A Whole New World Of The SDLC

XM Cyber's Tamar Stern - A Whole New World Of The SDLC

Sun, 12 Jun 2022 23:00

Rookout CTO Liran Haimovitch sits down with Tamar Stern , BackEnd Group Leader at XM Cyber. They discuss how she got into the world of conference public speaking, how performance for Nodejs is different from other languages, how to go about understanding performance throughout the software development lifecycle, and why she doesn’t recommend profiling your production environment.

Painless Cloud-Native Debugging
Rookout is a disruptive developer solution for Cloud-Native debugging and live data collection.

Listen to Episode

Copyright © © 2022 The Production-First Mindset

Read Episode Transcript

Welcome to the production first mindset, a podcast where we discuss the world of building code from the lab all the way to production. We explore the tactics, methodologies and metrics used to drive real customer value by the engineering leaders actually doing it. I'm your host, Dylan Simovic, CTO and co-founder of Frugal. Today we're going to be discussing all about JavaScript and productive performance with us. It's the mouse sharing and engineering group leader at XM Cyber. She is an ojs expert and a public speaker. Thank you for joining us and welcome to the show. Hi, it's really nice to be here. Well, I'm Tamar Stern, 38, married with three kids currently. I managed the back end development in XM Cyber in the last few years. I. Was diving into a node JS and diving like a lot into the internals of Nodejs. An expert nodejs developer loves to code and I'm also a public speaker. As you mentioned, you've given a lot of talks about JavaScript at various conferences. Can you show us how that came to be? I had a friend that was doing that, that was lecturing a lot in the C area, and I saw that like lecturing, helping her to study a lot of things. Like if you decide that you would like to lecture, that you would like to develop a career of public speaker, then this is something that you need. Like to aim to you need to study that. You need to prepare yourself, you need to look at conferences, understand how to get accepted, what are the topics that the conferences are looking for, what were the main talks last year? But really you have to decide that this is like an interest to you and this is something that you want to do, because I think that you have to invest your time in it. You cannot do it. Without investing your time in it and you cannot do it without like preparing and studying the subject. So I had a friend that was C expert and I saw like how much knowledge gained during that process. And at that point I was working a lot with no JS and I was also a very deep inside the architecture and I knew the internal of the language and from that point. I started building it like in very small steps. I started with small meetups in Israel. Then I started to have several talks in JavaScript Israel that one of the leading JavaScript meetups in Israel. And then I started to think about how to get accepted to conferences. I try to get accepted to several conferences in Israel. One of that has succeeded. One of the like attempts to get accepted has succeeded. So after I got accepted to 1 conference, it was Geek time code. At that point I tried to get accepted to the international jobs with conference and I was lucky and I got accepted. I got accepted with two talks. I worked on them a lot. I rehearse them for I don't know for two months in advance and I thought about one of them was about performance and one of them was about security. And I build like a lot of demos of cool security. I have one demo there that you can break into a server without knowing even one username and password, just with no SQL injection and for the performance. They had like a lot of them was, I did like a lot of live demos and I've showed like code that working more efficiently. And I went there and I was like a total newbie. I didn't have any Twitter. Then my talks just got exploded, exploded, and at that point I think that a lot of doors got opened for me and I've done the JS Congress and multiple docs in the International JavaScript conference. And I've done JS Conf, which is, I think that's the most prestigious JavaScript conference in the world. I think one of the most exciting moments that I had actually, it's not from a conference I London node user group approached me and wanted me to lecture in the meet up in the it's like the main node JS meet up in London and in the crowd were several developers. From a node JS core team, that was amazing because they told me that they really want to hear what the lecture that was planning for there and that was like, wow, then OS courting developers are coming to hear me. Oh my God, that's amazing. But I've done like a lot of talks. Some of them has like a few thousands views in YouTube, which is a lot for technical talks. And after all of that I've started to be a moderator. And international conferences also. This is how everything developed actually. It's like really helping you when you structure something, first of all, like your knowledge becomes like very, very deep in the subject. Then, like you organize everything and you become like, you really master the knowledge in a new way. So I think it's it's cool and it's really helpful and it opened a lot of doors for me. What do you like speaking about and no JS. I have several talks about performance, about the architecture of the engine and how like the engine, like the internals of the engine works and how to perform operations. For example, Nodejs is a language which is like very build for non blocking operations for a lot of eye operations. But if you would like to do an ojs CPU, extensive algorithms, let's say let's take the area of. Machine learning algorithms then it's not the ideal language for that, mainly because CPU intensive operation doesn't really perform well in node JS. For example, I have a lecture about that and about how to. Handle this problem using an internal model of node JS which called the worker threads. That's the name of the model. Also I'm showing there how to improve performance of a server that has like a lot of memory algorithms in it. Using that model I have a lot of talks about serverless about how to build a good microservices architecture and OS and how. What are the best ways that microservice can communicate with one another? Everything actually comes from the field, let's say or from my working experience, because I have a chance to work on a lot of cloud systems with the complex architecture and I get a lot of inspiration from work from like the technical problems that I have at work to build my lectures. Performance seems very close to your heart, and you seem to be giving a lot of talks about it. Kind of. Why is that? Yeah, that's a good question, actually. Performance is now, let's say escorting me, or I somehow started to get interested in that even before I became an ojs developer. Before that I worked several years in C# and over there also I developed a lot of new features, but I also did a lot of performance optimizations. Actually I find it interesting because of the challenge it has inside. Because you look at a problem and you have to understand how to solve it and you have to like get like really deep into that technology and understand what would be like the correct solution, what would be the best solution. Also you have to understand how many resources you have well currently in cloud environments the like the resource. Question is is less relevant because we can have a lot of resources, so a lot of like performance improvements right now would be like to build like a good microservice pipeline and to understand how to like do a, let's say single responsibility like which activity should do where and how to be able to build services that you are able to replicate them with no problem. But yeah, I started to. Like mastering this subject before before getting into no jazz now. How is performance for node JS different for other languages? How do you approach the unique nature of the interpreter and the language differently? If you're taking languages like Java or Python or C. Ruby languages that are multithreaded. Those languages working in pattern called blocking IO. What did that mean? Let's say that I operation. I hope that the notion of I operation approaches to like HTTP request, database query, reading from a file, etcetera. If you're working and blocking IO approach, let's say I'm working with Python And I would like to write a database query. I would open a new thread in Pitem and then I would run that database query and my thread is going to wait until the database query would come back and that time it won't do anything. Usually what's going on in those languages is that for every request there is a new thread that is handling it. Also to optimize. For example, I was talking about the database queries. I've seen mechanisms of like having a thread pool that you're sending the queries to it and that thread pool is processing the query and when it's ready like the responses are coming back. But that like. Thumbs up and do like a lot of threats that are being open and in OGS it's it's different because the approach is nonblocking IO. Well, in high level the architecture of node JS is an event loop. Evenflo once, like a request is coming to the server, a call lake is being registered to the event loop, and the event loop is starting to execute the flow of that request. Of course, the event loop has several phases. I mentioned them in some of my lectures and I get like really deep into the phases of the event look. But once an eye operation is happening, or let's be more specific, it's not an eye operation, it's asynchronous operation then the operation. Let's assume that this is an IO operation of HTTP or DB query is offloaded to another component called the worker threads that component is processing. Like the David query, it's processing the operation and the meanwhile the envelope can do other things like handle requests from that are executing flows that are coming from other requests. That has arrived to the server and the server has to like serve them so. You have to think that you have a constant amount of threads. The event loop is running in one thread and the worker threads are running. This is like a thread pool with a constant amount, and if you're blocking one of them it would be very not good. You have to think about how to work asynchronously all the time if you're working with synchronous. APIs. Then you can block the event loop, which is I think the worst case that can happen. You're blocking the event loop for example, if you're doing like complex algorithms in memory that you're doing them in synchronous way and they're not offloaded to the worker threads. Or for example eye operation that are working with synchronous APIs. In those cases the event loop is executing the operation and cannot serve other requests. I think that from that from the ability to serve multiple things in parallel, the power of the Evancho is coming. So the event loop can offload the heavy activities to the worker thread and serve other things. So how do you go about that? How do you? Offload computations or activities to worker threads if you're looking at Nodejs code. Then you have an asynchronous APIs that an old node JS code they they were working with callbacks and in the newer versions we're working with async await. Async wait is actually like a syntactic sugar for promises. So by convention in the language you can assume that when you have an API which returns a promise or you know implemented with async await. Then it has asynchronous operations inside of it, and all of those operations, every asynchronous operation, will be offloaded to the worker threads. But this is only convention. Actually. In order to submit a task to offload an operation to the worker threads, you have to write a code to the library that is actually doing it. That is called libuv. That code is written in C++. But by convention in the language, all of the language are using a set of basic languages and those languages. There is code that was written in C in the libuv level, which is offloading the operations to the worker threads. And by convention, every operation that is offloaded to the worker threads is wrapped with an asynchronous API when we have an asynchronous. API. We can assume that behind the scene the scenes somebody wrote a C code that is sending that operation to libuv. Of course that I can create a new promise or create a new asynchronous API and right inside a total synchronous code, for example, I can like add numbers or stuff, but this is not something that you're doing. I mean if you are. Writing a synchronous and asynchronous API then you have to work with asynchronous operations. And as I said, usually you're sending your heavy operations to the workers threads which are IO operations and also. In advanced Libraries, CPU intensive operations like for example you can look at them crypto library. Several APIs and crypto library are asynchronous. You can look at Tensorflow which is a library for machine learning algorithms. If I remember correctly, all of that library is asynchronous. Things are done efficiently over there and are offloaded to the worker threads. So I operations and the CPU intensive operation that are wrapped with asynchronous API are executed inside the worker threads. So how do you go about monitoring that? How do you go about understanding performance throughout the software development lifecycle? Alright, so performance during the software developer lifecycle, it's a whole world. It doesn't really end in the like the abilities of the Nodejs language. You have the database, you have the resources of the machine. It's like a whole. Let's take a a problem that we're facing now, for example. It's a legacy API that we have in our software that was written a long time ago. And what's going on in there? Is that, you know, I have a UI and for the UI I get a page of entities. Let's say a page size is 100 entities. The problem is that when I am for example calling the API like this is something that I'm dealing with it in the like last few days. Every time that you call the API there is a very complex computation that is happening on every work record. And what's happening is that when you're working with an API, for example when you're working with an HTTP request, what you want to do is do one query on your database and take all the data from your database, do zero processing and send everything to the front end. Like really zero processing. And here we have like a very complex processing on every record. And even you know that they DB queries are going one by one, meaning like for each record we're doing like one query and then we're doing that processing also on the database. And what we can do instead of that, that would be much more efficient. We could compute the complex computation for all the records that we have in a different microservice and like save it to the database. Let's say it's not real. Time so we can compute it like every 15 minutes or something like that or every 10 minutes in a different microservice and then the API that like the HTTP request would just do one DB query like get top 50 or top 100 with all the data. Throw it through the front end and that's it. So how did we discover that? As I said, actually that problem like combines the limitations of. No limitations of database. But another aspect that we need to see with performance problems is, for example the database itself. We have to see which queries we have on the database. For that you have to do database profiling. You have to look at the database. Verilog and see which queries are running. At this case we've started to see that for every record we have a query that is doing. We have like several queries that for fetching the data. So after examining the DB queries so we've done profiling through the application and for example in node JS you can do that with tools calls like Node Inspector. There are other profilers like Clinique. For example, but I'm I'm kind of a fan of node Inspector. I love it and it really helps me to find like a lot of problems and I'm very used to it. So we've we've seen in the like CPU profiling that we have done in the node application that after running like like fetching a lot of data from the DB, then for every record we're doing some kind of like memory computation, adding things to one another, computing stuff and then getting it back. So it's really inefficient. So the process started. That like doing in parallel the CPU profiling for the application and the DB profiling for the database. And then we saw in the DB profiling all of those queries running and then we've signed the application all the like the in memory computation. And as I said, if we're doing in memory computation in a synchronous way over there like we've done, we're doing everything in memory and so we're blocking the event. The power of node JS is like working asynchronously and offloading, doing everything efficiently, offloading things through the worker threads. So after we've done like profiling in both ways we've saw those problem. Well, there's other things that I can say about that. You spoke also about monitoring, but I would like to say that we Umm, I don't recommend to profile. Your performance your your production environment. I do recommend to do so because when you're profiling your production environment you can add, well, profiling a process at an overhead on the process they say. Well, one of my experts in my group is debating me and saying that CPU profiling is not giving so much load on the process, but still even he doesn't do it. For more than like 20 minutes or 15 minutes on production, what we do have in our systems. Is that we have a duplication of the production environment that's it's called, it's like a shadow environment and over there also the version is being installed and over there since it's imitating exactly the production environment, we can do the profiling and see the problems of course that on production environment we don't do that. But that is, yeah, this is like a very useful thing that we're doing that. Gatto environment that we have created that imitating production environment and that's like a really good replica and we can see all the real problems that are happening in production over there regarding monitoring. So as I said profiling production environment is is not really good you have to as I said we're we're only profiling the shadow environment if you don't have the ability. To take up a shadow environment, what you can do is use performance logs and using performance logs to measure how much time every operation is taking. And then there are like really good tools like Prometheus and Grafana. Well, Prometheus and we're fun of. Prometheus is a tool for sending metrics on your server, important metrics that you choose. And Grafana is a tool that's coming, let's say above Prometheus and helping you to build like a lot of dashboards. And well what's going on is that primitives is reporting live data into grafana and then grafana dashboards are being updated. This is something that we also use so you can define like metrics for like responses that are coming very slowly. You can say, all right, if a response from my server is slower than three seconds, please report it to parameters. And then Prometheus is reporting to and then we're finna takes it from from from this if you don't have the possibility to work on to take up a shadow. Environment for your production environment then performance logs and monitoring is the best. Other things that we're doing for monitoring is using elastic search and Kibana. Actually we're using that to monitor logs, but we also monitor other points in there, other important metrics in there also that tools of elastic search and Kibana. Is helping you. For example, if you have fatal exception and something is collapsing, you can create an alert from there which is also very helpful. Awesome. So, Tamar, thank you very much for joining us. Thank you very much. It was very nice to be here. So that's a wrap on another episode of the Production first mindset. Please remember to like, subscribe and share this podcast. Let us know what you think of the show and reach out to me on LinkedIn or Twitter at production first. Thanks again for joining us.