more jpsamaroo's comments

jpsamaroo · on March 1, 2021

OpenCL and various other solutions basically require that one writes kernels in C/C++. This is an unfortunate limitation, and can make it hard for less experienced users (researchers especially) to write correct and performant GPU code, since neither language lends itself to writing many mathematical and scientific models in a clean, maintainable manner (in my opinion).

What oneAPI (the runtime), and also AMD's ROCm (specifically the ROCR runtime), do that is new is that they enable packages like oneAPI.jl [1] and AMDGPU.jl [2] to exist (both Julia packages), without having to go through OpenCL or C++ transpilation (which we've tried out before, and it's quite painful). This is a great thing, because now users of an entirely different language can still utilize their GPUs effectively and with near-optimal performance (optimal w.r.t what the device can reasonably attain).

[1] https://github.com/JuliaGPU/oneAPI.jl [2] https://github.com/JuliaGPU/AMDGPU.jl

my123 · on March 1, 2021

Which is something that CUDA provided since the very beginning. (with PTX)

pjmlp · on March 1, 2021

No one will take over CUDA's dominance until they realize that one reason why most researchers flocked into it were its polyglot capabilities, and graphical debuggers.

moonbug · on March 1, 2021

in other words, Nvidia's product execution was spot-on.

pjmlp · on March 2, 2021

Helped by Khronos focus of never supporting anything other than C and letting the community come up with tools.

So for years, before they started having a beating OpenCL was all about a C99 dialect with printf debugging.

SPIR and support for C++ came later as they were already taking a beating, trying to get up.

Apparently that is also the reason why Apple gave up on OpenCL, disagreements on how it should be going, after they gave 1.0 to Khronos.

Just compare Metal, a OOP API for GPUs, with Objective-C/Swift bindings, using a C++ dialect as shading language, a framework for data management, with Vulkan/OpenGL/OpenCL.

moonbug · on March 3, 2021

well, yeah. All of the opencl implementations were (and probably still are) awful. Other than requiring a lot of boilerplate, I found it adequate, essentially a clone of the the CUDA C driver API.

opencl 2 went into the weeds going full c++, but that all got rolled back with version 3.

jpsamaroo · on March 1, 2021

Yes, and that's why Julia gained CUDA support first. My point was to respond to "Why would someone use this instead of plain old OpenCL(or CUDA) with C++?", and my answer was, "you can use something other than OpenCL C or C++". I'm not trying to say that CUDA is any lesser of a platform because of this; instead, other vendor's GPUs are now becoming easier to use and program.

jpsamaroo · on Jan 18, 2021

At least for the GPU case, the ecosystem is slowing moving towards writing generic kernels that can be executed on both the CPU (multithreaded) and the GPU, without doing anything special in the kernel itself, via KernelAbstractions.jl. It's still got a little way to go, but already some larger codes are using it to great effect. Also, as a member of the JuliaGPU group, I know that AMD and Intel GPUs should be supported by KernelAbstractions within the next month or two, so a single generic kernel will be able to run unmodified on all major GPUs.

jpsamaroo · on Jan 18, 2021

JavaScript's JIT is a tracing JIT, so it can compile code in the background while the interpreter/less optimized compiled code is actually running. In Julia, the compiler runs first, and then the compiled code is run. This will probably eventually change as Julia's compiler improves, but regardless, it's important to note this distinction.

pjmlp · on Jan 18, 2021

It can make use of JIT caches (if it doesn't already), not every JIT compiler starts from zero.

jpsamaroo · on Nov 18, 2020

Actually, Sxmo has MMS patches pending on the mailing list which allow receiving and viewing text, audio, and video content: https://lists.sr.ht/~mil/sxmo-devel/patches/14017

EDIT: Wifi works perfectly for me as well, including excellent hotspot (via nmcli)

kop316 · on Nov 18, 2020

Holy crap, that is awesome!

It looks like they are asking for testers, but I cannot seem to figure out how to add it. Do you have knowledge on how to do that (or a good place to point to to try it)?

It also looks like it is for downloading MMS only, it doesn't look like it has support (yet) for sending them.

> Wifi works perfectly for me as well, including excellent hotspot (via nmcli)

Sorry, that looks to be a typo by me, I meant to say it works just fine for me.

jpsamaroo · on Oct 21, 2020

Tim has been working on making it easy to use CUDA.jl and AMDGPU.jl pretty interchangeably through GPUArrays.jl, and this approach seems to be pretty extensible to other accelerators like Intel's dGPUs. KernelAbstractions.jl will also be gaining AMDGPU.jl support soon, so it'll be easy to write generic kernels without buying into a single vendor's cards.

jpsamaroo · on Oct 20, 2020

> This is a complete novice, ill informed, question. So forgive it in advanced, but why have an AMD specific backend at all? Couldn't you just use AMD's HIP/HIP-IFY tool on the CUDA backend and get an AMD friendly version out?

HIP and HIPify only work on C++ source code, via a Perl script. Since we start with plain Julia code, and we already have LLVM integrated into Julia's compiler, it's easiest to just change the LLVM "target" from Native to AMDGPU (or NVPTX in CUDA.jl's case) to get native machine code, while preserving Julia's semantics for the most part.

Also, interfacing to ROCR (AMD's implementation of the Heterogeneous System Architecture or HSA runtime) was really easy when I first started on this, and codegen through Julia's compiler and LLVM is trivial when you have CUDAnative.jl (CUDA.jl's predecessor) to look at :)

I should also mention that not everything that CUDA does maps well to AMD GPU; CUDA's streams are generally in-order (blocking), whereas AMD's queues are non-blocking unless barriers are scheduled. Also, things like hostcall (calling a CPU function from the GPU) doesn't have an obvious alternative with CUDA.

soganess · on Oct 21, 2020

Thank you for taking the time! I found this quite helpful.

MayeulC · on Oct 22, 2020

Something that is hinted at, but not spelled out loud in our posts is that AMD actively upstreams and maintains a LLVM back-end for their GPUs, so it really is a matter of switching the binary target for the generated code, at least in theory :)

HN For You