All Projects → szilard → talks-main

szilard / talks-main

Licence: other
Most recent/important talks given at conferences/meetups

Most recent/important talks I gave at conferences/meetups

In the last 5 years I gave about 50 talks at various data science and machine learning conferences and meetups. Some of these talks have been video recorded and are publicly available online. Many of them have been on the same topic, but with content being gradually updated with results from new findings (so older talks have been superseded by newer ones). This repo aims to keep a pointer to the most up-to-date recorded talk in each of the main topics I addressed.

Gradient Boosting Machines (GBM/GBDT) / Machine learning benchmarks

With all the hype about deep learning and “AI”, it is not well publicized that in prediction tasks with structured/tabular data (as in most business applications) gradient boosting machines (GBM) usually achieve better accuracy than neural networks. In this talk at the LA West R Meetup (May 2019) I reviewed the most commonly used GBM implementations such as xgboost, h2o, lightgbm, catboost, Spark MLlib, and I discussed their main features and characteristics (such as training speed, memory footprint, scalability to multiple CPU cores and in a distributed setting, performance when running on GPUs etc), video here. Essentially the same talk has been delivered 2 weeks later at the Berlin Buzzwords conference, where it has got a more pro recording, video here. Previous versions of this talk have been given at several conferences and meetups (2018-2019, for example PAW, Crunch, Dataworks Summit etc), but the talk is also based on some other previous talks about benchmarking open source machine learning libraries in general (2015-2017, including my keynote at R Finance conference in Chicago, May 2017, but also talks at H2O World, PAW, EARL, Domino Data Science Popup, useR! etc).

As a spinoff of the above, I created a separate GBM introductory talk in which I show how easy it is to get started with GBMs with demo code both in R and Python. I gave this talk 2 times already (last time at the LA Data Science Meetup in February 2020), but it has not been recorded yet. Update: I gave the same talk at the Datacon LA conference in October 2020 (online), video here.

Another spinoff but in the other direction, I made a separate more advanced talk that goes more in-depth in performance topics and I gave that talk at the LA Data Science meetup in November 2020 (online), video here.

Machine learning in production / in business applications / best practices

Best practices for using machine learning in businesses have been discused in my keynote to the Budapest BI & Analytics Forum conference in Nov 2018. That talk has not been recorded, but I gave a "replay" of it in the Use cases seminar for the MS in Business Analytics at CEU (University) in May 2019, video here. (I also gave a slightly updated version of this talk at the Los Angeles Data Science meetup in August 2019, but unfortunately that has not been recorded.) Update: I gave a further updated version of this talk as the inaugural event of the Albuquerque Machine Learning Meetup in August 2020 (Online due to COVID), video here.

In addition, I walked through the whole workflow for developing machine learning models and deploying them in production in a talk at the Los Angeles Data Science/Machine Learning meetup in May 2017, video here.

Using R for data science

The 2 tools most widely used for data science in the last few years are R and Python. I have been using R since 2006, and I gave a talk about my journey and some best practices in using R for data science at the Budapest Data Science meetup in Aug 2015, video here.

Size of datasets for analytics

In the last decade we have seen a huge amount of hype around "big data" and distributed systems supposedly able to cope with such data (Hadoop, Spark). At useR! conference at Stanford University (June 2016) I talked about the size of datasets typically used for analytics and the low-productivity of using "big data" tools, video here.

Machine learning with H2O

If you deploy machine learning in production (especially real-time scoring), H2O is one of the best tools to use. In this meetup at AT&T in Los Angeles (Jan 2017), I gave an overview of H2O and how to use it in business applications, video here.

Physicists in data science / Intro to data science / Career tips

In this talk in the "From the atoms to the stars" (Atomcsill) series at Eotvos University (where I obtained my PhD long time ago), I introduced college aspiring high school students to data science and I gave them some career advice. The talk was in Hungarian though, video here.

Bonus: KDD conference invited talk

This is probably my most prestigious conference invitation so far, therefore I included my KDD 2017 (Halifax, Canada, Aug 2017) talk here. It was a mix of the topics discussed in previous talks with the theme of what's the current state of ML in practice and where are we headed ("Machine Learning Software in Practice: Quo Vadis"), video here.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].