For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | dafrdman's commentsregister

You can now find a pdf version of the book at https://github.com/dafriedman97/mlbook/blob/master/book.pdf. JupyterBook is still working on the PDF creation, so this doesn't have any of the images unfortunately. That said, most of them aren't too important (until the neural net chapter, where they get a little more important)


Thanks so much! I would say two major differences: 1, as you mention, it codes each method up from scratch in Python readers can really see each step the method uses. 2, it is focused on the derivations of these methods, rather than their intuition, applications, etc.


It's definitely not deep learning focused. I wanted to start by introducing the models machine learning practitioners should all know. But collaborative filtering and stuff along those lines would be a good addition! Thanks for the feedback.


Agreed. That's #1 on my list right now.


I agree though I saw that as outside the scope of this book. I tried to be clear in the introduction that the book is a "user manual" of sorts that simply shows how to construct models, rather than how to decide between them, what the benefits of each are, etc. That information is certainly important but I felt it had been covered more than adequately by books like Elements of Statistical Learning


Perhaps I should have been clearer, but the "code" section within each chapter is not "from scratch". The "construction" section is "from scratch" in that it only uses numpy (not scikit learn). The scikit-learn part is just so new users can see how these could be fit in practice.


I went straight to the "code" section. Didn't know there was stuff in the "construction" section too. I would definitely consider not using sklearn for anything other than data sets. You already defined them, why not use them? Or rename the "code" section. I expected that to be the final code as you build it. Maybe show the usage of both side by side, as a way to ease people into sklearn. But the "code" section should totally be focused on what you made.


That's sensible. Maybe change construction to code and code to application? Or keep construction but rename code? I'll have to brainstorm. I definitely don't want people missing the construction section so this is great feedback. Thanks!


I like the sound of Application at least. Or 'In practice'?And Construction does make sense when I think about it more. Not sure I can think of a better name at least.


My hesitance with "Application" is that sounds like I'm going to use some interesting dataset or do some cool project (and this is essentially using iris to build basic models). How about "code" becomes "implementation"?


Good question. I definitely prefer downloadable books myself. I made it in JupyterBook because that was easiest with the executable ipynb files. I'll look into whether I can make it downloadable and update you if so.


Thanks so much for your feedback. Definitely open to comments!

I agree 100% that any use of packages can be intimidating for newbies. I experimented at first with creating the models without using numpy and I thought that it actually made it less clear rather than more clear. It's obviously a tradeoff--you see where everything comes from (rather than np.mysterious_function()) but you take 5 lines of code to do the same thing that a single numpy command could accomplish. I felt in the end that it distracted from the real purpose of the code, which is to demonstrate how the model works.

Do you think a compromise would be to add a section to the appendix introducing numpy? Introducing arrays, random instantiation, stuff like that? Otherwise I might consider adding a no-numpy version in the future.

Thanks so much for your feedback!


Perhaps you could reduce the set of numpy functions used in your code to a minimal set (exp, sum, max, min, etc.) and then build fancier functions up from there. This affords you the speed and conciseness of using numpy arrays while limiting the abstractions that could obfuscate the inner workings of some of the fancier functions you might use (e.g. softmax).


I think there is a balance to be struck. You should totally use numpy for the arrays and basic math applications. But say on the first example you use `self.X.T` what does `.T` even do? Not asking you to go into all the details, just more comments saying this transposes the array, see numpy docs <link>. It will ease people into the library if they are unfamiliar with it. You do have some good ones like `column of ones` already, but more of those kinds of things.

I would also avoid using pandas if at all possible. Its just another thing people have to learn if they are unfamiliar.


I definitely agree. I should add more comments explaining what things like .T does--it's not that it's hard to grasp, but it might turn away newbies. Thanks for the suggestion!

Pandas is only used in the "code" sections, which use packages like scikit-learn anyway


That's a good point. If you had to explain how everything works without the libraries, you'd probably end up writing a book on how pandas and numpy works, not how ML works.

An appendix is a great idea!


Good ideas. I think I'll try to add an appendix, minimize the number of numpy functions used, and explain any of the weird ones that are real time savers. Thanks for all your thought.


I'd love to include the book to our company internal learning resources. Can you include an official license by any chance? Thank you


I hadn't even considered licensing it. Want to email me and we can talk? My email is dafrdman@gmail.com. That said, you're welcome to use it (though my lawyer father suggests I say that this "verbal contract" is revocable and non-exclusive).


Thanks for the helpful feedback. I wanted to put emphasis on the graphs so I chose to hide the code but maybe it's not worth the cuteness of the "click to show". Changing that now.


Good call! I'll work on that ASAP


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You