To start off, I've now relised that people are actually reading this, and I should probably write this a bit more formally
Now for the more interesting, and technical parts. So last week I did talk about some changes I made to AMD's LLVM fork, most notably; inside of comgr it used the Linux macro CLOCK_MONOTONIC_RAW . Now at the time, I thought it's equivileant on FreeBSD was CLOCK_MONOTONIC_FAST , but after consulting some people on irc, I was told I should use CLOCK_MONOTONIC . So I did, then given that it was a 3~ line change to get the entire tool chain working on FreeBSD, I decided to try and upstream it. To some level of suprise, it was merged in within about 2 days! Here is a link to the pull request . So far I haven't hit any other issues with the llvm fork, so I don't intend to make any other changes.
To be honest, I have not yet looked deeply into what this macro is used for, but I'd assume it's something along the lines of freshness, or queueingi priorities.
On a related note, I did also figure out that the reason why the envrion variable wasn't linking on the Program.inc variable, was because I was building it in the wrong order. I used these commands to build it int he order that I'm listing them. These commands were run from inside of ~/dev/llvm-project/build
cmake -G Ninja \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_TARGETS_TO_BUILD="AMDGPU;X86" \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/usr/local/rocm-llvm \
../llvm
ninja && ninja install
cd /root/dev/temp/llvm-project/amd/device-libs
mkdir build && cd build
cmake -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/usr/local/rocm-llvm \
-DCMAKE_PREFIX_PATH=/usr/local/rocm-llvm \
..
ninja && ninja install
cd /root/dev/temp/llvm-project/amd/comgr
mkdir build && cd build
cmake -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/usr/local/rocm-llvm \
-DCMAKE_PREFIX_PATH=/usr/local/rocm-llvm \
..
ninja && ninja install
So last week I did talk about rocr-patches, and how it was going. So I did do some more investigation into the weird dxg behaviour, I uncovered two things; I mis-interpretted the dxg (DirectX graphics) loading, and I needed to add in a specific code block to handle loading on FreeBSD.
When it comes to mis-interpretting, I had originally thought that it will always try to load dxg, and """succeed"", even if it did not (even on Linux). When in actuality it would try to """load""" the /dev/dxg path, and access it, if it was Null, or otherwise invalid it would not init, or load the drivers. This exists for WSL.
Now when it comes to the changes needed for FreeBSD; just add in *another* ifdef saying that if you are freebsd, to load libhsakmt.so, or something along those lines
Those were basically all the changes, and I will likely begin to start cleaning up these patches, naming them correctly, etc, and hopfully submit 1-2 of the smaller changes to be upstreamed. If you want to use my fork, I would reccomend using this hash cdf9888723e31457e9e2915b47e916833611b8f4 , from this repo , I will also add that this repo may start changing as I start preparing to merge stuff in.
Now this is where I spent the bulk of my time, and confusion for the past few days
If we think back to post 1 , I said that I was using the Foundation's Framework Desktop, which was running a Strix Halo chip, which gets supported in a somewhat stable fashion in Linux Kernel verison 6.18.4. Now drm-kmod is currently on kernel verison 6.12... Now you might see a problem here, and so do I...
Before I can even start working on getting amdkfd working, I'll need to get drm-kmod upto 6.18.4, is what I thought for about 3~4 days, until at some point during the porting process for kernel verison 6.13, I relised, if I keep doing this, I'm never going to finish. So I sat down, for an entire day, and did not write a single bit of code, I only read. I read quite a few articles/the documentation on how dma works on linux (did you know that once the pci device has been enabled, you can start talking to it via DMA??? Don't even need to wait for the init to finish, or that it defaults to 32 bit address space cause of old stuff), how amdkfd uses memory mapping stuff, paging (I'm a math+physics major, this is the first time I've cared about paging in my life), and a couple other things. Based off of that, I figured out; I don't actually need to go through all 5~8k patches to get to kernel verison 6.18.4. I can just strip out all the intel code, display+graphics+connector code, amd legacy graphics, amd code for other gpus, and just port over the halo strix code. Which is quite a bit easier.
Here is my custom branch where I am updating exactly what I need, and nothing else. I would absolutely not reccomend using this, as it stripped to the bare minimum, and yeah... So far I've moved through about 50~ patches out of about 1800~, I hope to get through all 1800~ by Tuesday, and be able to start plopping amdkfd into it. However, I did notice that a lot of the amdkfd code, especially in structs, is missing... So it may take another week before I can start on that. On a related note, I was asked to find all the LinuxKPI things we'll need to implement, so someone else can work on that in parallel. I will need to do that at some point... BUT YEAH
I will add that I thought that 6.13 merged in perfectly when I started, cause I thought the drmapplypatches script worked perfectly without errors, little did I know, I did not recognise the errors as errors... Otherwise, it is just insanely exhausting to apply all these patches, especially since it has diverged in some significant ways, and because amd engineers so far like to submit massive patches which are over 300 lines at times, and hurts my brain to merge in manually if patch doesn't work
This project is slowly taking shape, and I think I should be able to get simple vector addition hopefully working at some point sooner, rather than later. But I will admit that porting over drm-kmod is incredibly exhausting mentally, and I am amazed that people are able to do this without losing their mind.
I lowkey wish I had a server I could connect upto antigravity, and use it like an llm with unlimited token limit, cause llm's are SO GOOD at fuzzy matching function searches. As good as I am at grepping, using find, and tracing, llms are just so much faster. Like I asked it "find me all the functions which interact with the linux memory sub system", and it goes off for about 20-30 minutes, and BOOM perfectly did it. I will also admit that they are terrible at summarising sub-systems, so I spun up a small script to go over all the linuxkpi sub system and summarise it for me. It outputted... something, it wasn't compelete garbage, but it may as well have been.
Now going back a bit, you remember how I talked about the DMA (direct memory access), memory mapping, and paging? Well think of that mention as foreshadowing for the next few weeks... I'll likely need to talk to a lot of the senior contributors of FreeBSD on how to handle, well how any of this stuff needs to change/work for ROCm to work. So by default Linux gives every DMA thingy a 32-bit address space, which roughly equates to about 4 gigabytes, now if you've kept up with anything in games, graphics, ai/llms, ml, etc, you'll know that 4 gigabytes is nothingm, which is what I thought too. So I did some digging, and found out that, usually these graphics drivers will EXPLICITLY opt-out of the Linux Kernel's DMA implmentation in favour of their own internal implmenetation. Which... is going to make porting over amdkfd a bit more annoying, but I have not yet gotten there... So we'll find out hopfully soon
I lowkey wish I could ask an llm to apply the patches which don't apply via git patch, or gnu patch, like some of these are so simple, just tedious AAAAAAAA
Here's a picture of a massive number of deletions when I first tried to delete code, had to back track after patching this was too painful
Another funny commit I saw
I have this one tiny patch I should make to LinuxKPI, but have not gotten around to yet;
diff --git a/sys/compat/linuxkpi/common/include/linux/pm_qos.h b/sys/compat/linuxkpi/common/include/linux/pm_qos.h
index 47c41a819..97d16369a 100644
--- a/sys/compat/linuxkpi/common/include/linux/pm_qos.h
+++ b/sys/compat/linuxkpi/common/include/linux/pm_qos.h
@@ -28,6 +28,8 @@
#ifndef _LINUXKPI_LINUX_PM_QOS_H
#define _LINUXKPI_LINUX_PM_QOS_H
+#include <linux/types.h>
+
#define PM_QOS_DEFAULT_VALUE (-1)
struct pm_qos_request {
If you made it this far, you're clearly committed... If you have some time can you help me with a patch I need to make to my rocr-runtime? FreeBSD and Linux's ioctl formats do not match, and are off by about 1 bit. All you need to do is correctly no-op that one bit, and shift the format a bit. If you have some time could you possibly email me so you can help me with this? I don't want to work on this whilst I'm getting amdkfd working, so that I don't lose too much energy from context switching. You can find my email on my website's home page.