Upgrading a PC was more of a learning experience than I expected

Some people buy computing power for self-training or self-edification projects. Others rent computing power. I like owning the gear I work on. I purchased a desktop to be used as a gaming machine, a development system, a containerized workload machine, a Data Science machine, and a Machine Learning platform. I didn't understand that it would turn into a series of hardware upgrades bound by PC architecture constraints.


This was a great learning experience but not the best raw dollars investment from a pure cost/capabilities point of view. The 3 years of upgrades cost $1200. I saved some money purchasing previous-generation hardware. Buying current-generation hardware upgrades would cost $1900. I could have stopped anywhere on the path.

Apple wasn't a player in the gaming, M/L, or GPU market when I made my purchase. Apple has caught up for most of my use cases with its large shared memory architecture and performant CPUs. A Macbook might be the simplest approach for someone wanting to work from anywhere.

Purchasing the initial machine

I purchased a Powerspec machine during the pandemic when there were severe supply constraints. It was simpler than building from scratch with a 10% charge for simplicity. It is a great machine that is all a normal person would ever need.  The AMD Ryzen machine feels quick and the NVidia RTX card is good enough for my level of gaming.  Here are the specs.


This would be the last machine most people might ever need. That didn't stop me from tinkering with it or being forced to learn something about PC architecture, slots, memory configuration and video cards.

Solving big data problems with more RAM

I started writing Python code for some big data sets. The program wanted to bring them all into memory so I needed to buy more RAM.  That was pretty easy so just bought another 32GB of memory and installed it.  I bought RAM that was the same speed as the RAM that came withthe machine. It seemed simple.

I didn't know about RAM Memory Ranks

It took me 3 years to understand that I should have selected the new memory to better operate with the existing RAM.  The new memory that I purchased was Single Rank which was different than the original memory.  Mixing works but it degrades performance.  I can't find anywhere in the datasheet for my new memory that it told me it was single-ranked.

I didn't know about XMP DDR settings for faster memory

It took me 3 years to understand that you can run the default memory JEDEC DDR-4 speed or a higher speed XMP.  The JEDIC standard is 2133 MT/s and the XMP configuration was 3200 MT/s. This is a pretty significant boost in speed. My machine can't go above a speed of 3000.  I don't know if that is because of RAM quality or Rank mixing..

This works great until I come out of Hibernate mode.  The system resets the memory configuration back to the JEDEC 2133 MT/s only when coming out of hibernation.

Crypto mining with two NVidia cards

For the next phase, I wanted to see how crypto mining worked.  The machine has two PCIE physical X16 slots that look the same. I bought another GPU and put it in the 2nd slot.  It worked awesome for crypto-mining.  I didn't realize that the two X16 PCI slots were electrically different because it didn't impact that workload type.


Cryptomining does not move a lot of data and does not use a lot of PCIE bandwidth. Crypto miners optimized the low data rates using motherboards with many narrow PCI slots instead of very few wide slots. 

How many PCI lanes and how are they connected?

I'll just say here that I don't want to know this stuff and neither does anyone else. This is why we buy laptops. I chose an upgradable desktop so now I have to know what I need to know. This diagram was partially derived from images from AMD and a variety of blog posts. 

Our primary graphics slot is a PCIe x16 physical and x16 electrical with 16 lanes.  This is pretty much the fastest, best type of slot for gaming.  For CUDA or LLMs, we'd rather have two graphics cards.  Two slots configurable for x8 would be ideal. This motherboard doesn't do that.

The diagram tells us that our 2nd graphics card slot is somewhat of a sham. It doesn't run at the same speed as the primary graphics slot even though it looks like it. The 2nd x16 slot is physically x16 but four-lane (x4) electrically.  Even worse, it is attached to the chipset across a shared 4 PCIe lanes.  The CPU has to push data across the 4 shared chipset lanes, into the chipset, and then onto the graphics card.  It is more the conflict for bandwidth on the 4 lanes from the CPU to the chipset that will impact the performance of gameplay or other high-bandwidth video uses. Multiplexing was no big deal for crypto mining because so little data was transferred. x1 worked for crypto mining.

Data Science parallel programming upgrade.

I wanted more computing power for Data Science programming. Some of my code doesn't get a boost with CUDA. It seemed like a CPU upgrade was in order. I upgraded the AMD CPU from the 8-core Ryzen 5700G to the 12-core Ryzen 5900XT. I picked the 5900XT because it was an AM4 CPU that fit the existing motherboard with a TDP that the existing CPU cooler could keep up with.

To date, the experience gained by the upgrade was more valuable than the increase in computing power.

The AMD Ryzen upgrade increased the core count from 8c/16t to 12c/24t, doubled the amount of L3 cache, and upgraded the PCIe from 3.0 to PCIe 4.0 at the cost of the 5700G's integrated GPU. My existing mixed-rank memory had problems with the high-speed XMP settings. I replaced two sets of 32GB DDR4 3200 DIMMs with a single set of faster DDR4 3600 MT/s dual-rank DIMMs.  This had the side benefit of running the memory at an even multiple of the AMD fabric speed. I didn't even know this was a thing prior to the upgrade.  The memory went from 2100 MT/s to 3600 MT/s with the correct rank and interleave, some other stuff beyond my knowledge.


This did provide a measurable improvement in certain workloads at some $400 in cost.

Large Language models and the need for VRAM

LLMs exploded onto the scene. A good portion of the models require 16GB and more memory. Software development co-pilot configurations require 16GB or more. Training requires even more often 40GB and up. 24GB is really the smallest sweet spot for running LLM inference engines. Some training is possible. I've played with adding images to Diffusion models and they were hard pressed to run in 24GB.

At this point, the cheapest thing to do is probably to rent time in the cloud only paying for what I use. I like to know how stuff works and tinker so I decided to have my own rig. My interest was early in the Apple ARM release cycle that the Mac required too much care and feeding and was too unstable. All the best software targets NVidia CUDA and Tensor-capable devices.  This drove another upgrade round.  

I can run Ollama or the NVidia AI Workstation with containerized environments. 24GB of VRam lets me run a 16GB model with some other pieces or a single 20GB model.  I could have left a 2nd graphics card in the machine. I sold one of the existing cards and put the other in a different machine. The 2nd card would have been on the PCIe x4 indirect PCIe bus connection.  That would impact model load times but probably not token generation rates.  

Lessons learned

  1. Upgrading a computer is a learning experience. The biggest benefit was probably just learning how stuff works.
  2. I incrementally upgraded as I collected money.  I was impatient and could have saved up for something better.  The CPU and memory upgrades in particular were of questionable general benefit. I purchased them to solve specific problems.
  3. I had no idea about the limitations of the processor architecture I had purchased with respect to PCIe channels and the way they could be split on my motherboard.  I would have been better off with 2 PCIe 4.0 x8 channels/slots in place of the single PCIe 4.0 x16 that I have. That would improve the graphics performance.
  4. I had no idea about how different memory affected the system other than the general faster memory is better
  5. A large memory Macbook can do pretty much everything the upgraded machine can do, for my purposes, at about the same total cost.

References

  • https://www.tomshardware.com/news/amd-x570-x470-chipset-pcie-4.0,39651.html
  • https://imgur.com/a/amd-x570-block-diagram-b7tntpZ

Revision History

Created 2024/10



%3CmxGraphModel%3E%3Croot%3E%3CmxCell%20id%3D%220%22%2F%3E%3CmxCell%20id%3D%221%22%20parent%3D%220%22%2F%3E%3CmxCell%20id%3D%222%22%20value%3D%224%20lanes%22%20style%3D%22shape%3DflexArrow%3BendArrow%3Dnone%3BstartArrow%3Dnone%3Bhtml%3D1%3Brounded%3D0%3BstartFill%3D0%3BendFill%3D0%3BstrokeWidth%3D1%3BfillColor%3D%23f5f5f5%3BgradientColor%3D%23b3b3b3%3BstrokeColor%3D%23666666%3B%22%20edge%3D%221%22%20source%3D%228%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22-0.1429%22%20y%3D%2220%22%20width%3D%22100%22%20height%3D%22100%22%20relative%3D%221%22%20as%3D%22geometry%22%3E%3CmxPoint%20x%3D%22360%22%20y%3D%22420%22%20as%3D%22sourcePoint%22%2F%3E%3CmxPoint%20x%3D%22470%22%20y%3D%22460%22%20as%3D%22targetPoint%22%2F%3E%3CArray%20as%3D%22points%22%2F%3E%3CmxPoint%20as%3D%22offset%22%2F%3E%3C%2FmxGeometry%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%223%22%20value%3D%224%20lanes%22%20style%3D%22shape%3DflexArrow%3BendArrow%3Dnone%3BstartArrow%3Dnone%3Bhtml%3D1%3Brounded%3D0%3BstartFill%3D0%3BendFill%3D0%3BstrokeWidth%3D1%3BfillColor%3D%23f5f5f5%3BgradientColor%3D%23b3b3b3%3BstrokeColor%3D%23666666%3B%22%20edge%3D%221%22%20source%3D%228%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22-0.1429%22%20y%3D%2220%22%20width%3D%22100%22%20height%3D%22100%22%20relative%3D%221%22%20as%3D%22geometry%22%3E%3CmxPoint%20x%3D%22350%22%20y%3D%22410%22%20as%3D%22sourcePoint%22%2F%3E%3CmxPoint%20x%3D%22210%22%20y%3D%22460%22%20as%3D%22targetPoint%22%2F%3E%3CArray%20as%3D%22points%22%2F%3E%3CmxPoint%20as%3D%22offset%22%2F%3E%3C%2FmxGeometry%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%224%22%20value%3D%2216%20lanes%22%20style%3D%22shape%3DflexArrow%3BendArrow%3Dnone%3BstartArrow%3Dnone%3Bhtml%3D1%3Brounded%3D0%3BstartFill%3D0%3BendFill%3D0%3BstrokeWidth%3D1%3BfillColor%3D%23f5f5f5%3BgradientColor%3D%23b3b3b3%3BstrokeColor%3D%23666666%3B%22%20edge%3D%221%22%20source%3D%226%22%20target%3D%227%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22-0.1429%22%20y%3D%2220%22%20width%3D%22100%22%20height%3D%22100%22%20relative%3D%221%22%20as%3D%22geometry%22%3E%3CmxPoint%20x%3D%22230%22%20y%3D%22320%22%20as%3D%22sourcePoint%22%2F%3E%3CmxPoint%20x%3D%22300%22%20y%3D%22320%22%20as%3D%22targetPoint%22%2F%3E%3CArray%20as%3D%22points%22%2F%3E%3CmxPoint%20as%3D%22offset%22%2F%3E%3C%2FmxGeometry%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%225%22%20value%3D%224%20lanes%22%20style%3D%22shape%3DflexArrow%3BendArrow%3Dnone%3BstartArrow%3Dnone%3Bhtml%3D1%3Brounded%3D0%3BstartFill%3D0%3BendFill%3D0%3BstrokeWidth%3D1%3BfillColor%3D%23f5f5f5%3BgradientColor%3D%23b3b3b3%3BstrokeColor%3D%23666666%3B%22%20edge%3D%221%22%20source%3D%229%22%20target%3D%226%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22-0.1333%22%20y%3D%2220%22%20width%3D%22100%22%20height%3D%22100%22%20relative%3D%221%22%20as%3D%22geometry%22%3E%3CmxPoint%20x%3D%2290%22%20y%3D%22580%22%20as%3D%22sourcePoint%22%2F%3E%3CmxPoint%20x%3D%22190%22%20y%3D%22480%22%20as%3D%22targetPoint%22%2F%3E%3CArray%20as%3D%22points%22%3E%3CmxPoint%20x%3D%22250%22%20y%3D%22300%22%2F%3E%3C%2FArray%3E%3CmxPoint%20as%3D%22offset%22%2F%3E%3C%2FmxGeometry%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%226%22%20value%3D%22AMD%20CPU%26lt%3Bdiv%26gt%3B%26lt%3Bbr%26gt%3B%26lt%3Bdiv%26gt%3B16%20Graphics%20PCIe%20Lanes%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3B8%20GP%20PCIe%20Lanes%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3B%26lt%3Bbr%26gt%3B%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3BIntegrated%20GPU%20connected%20via%20Fabric%26lt%3B%2Fdiv%26gt%3B%26lt%3B%2Fdiv%26gt%3B%22%20style%3D%22rounded%3D1%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3BverticalAlign%3Dmiddle%3BfillColor%3D%23dae8fc%3BgradientColor%3D%237ea6e0%3BstrokeColor%3D%236c8ebf%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22270%22%20y%3D%22240%22%20width%3D%22140%22%20height%3D%22120%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%227%22%20value%3D%22NVidia%20GPU%20in%20slot%26lt%3Bdiv%26gt%3B16e%20PCI%20lanes%26lt%3B%2Fdiv%26gt%3B%22%20style%3D%22rounded%3D1%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3BverticalAlign%3Dmiddle%3BfillColor%3D%23f8cecc%3BgradientColor%3D%23ea6b66%3BstrokeColor%3D%23b85450%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22470%22%20y%3D%22270%22%20width%3D%22120%22%20height%3D%2260%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%228%22%20value%3D%22X570%20Chipset%26lt%3Bdiv%26gt%3B%26lt%3Bbr%26gt%3B%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%26gt%3B12%20PCIe%20Lanes%26lt%3B%2Fdiv%26gt%3B%22%20style%3D%22rounded%3D1%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3BverticalAlign%3Dmiddle%3BfillColor%3D%23f8cecc%3BgradientColor%3D%23ea6b66%3BstrokeColor%3D%23b85450%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22280%22%20y%3D%22400%22%20width%3D%22120%22%20height%3D%22120%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%229%22%20value%3D%22NVME%22%20style%3D%22rounded%3D1%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3BverticalAlign%3Dmiddle%3BfillColor%3D%23ffcd28%3BgradientColor%3D%23ffa500%3BstrokeColor%3D%23d79b00%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22140%22%20y%3D%22275%22%20width%3D%2270%22%20height%3D%2250%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%2210%22%20value%3D%224%20lanes%22%20style%3D%22shape%3DflexArrow%3BendArrow%3Dnone%3BstartArrow%3Dnone%3Bhtml%3D1%3Brounded%3D0%3BstartFill%3D0%3BendFill%3D0%3BstrokeWidth%3D1%3BfillColor%3D%23f5f5f5%3BgradientColor%3D%23b3b3b3%3BstrokeColor%3D%23666666%3B%22%20edge%3D%221%22%20source%3D%228%22%20target%3D%226%22%20parent%3D%221%22%3E%3CmxGeometry%20y%3D%2230%22%20width%3D%22100%22%20height%3D%22100%22%20relative%3D%221%22%20as%3D%22geometry%22%3E%3CmxPoint%20x%3D%22220%22%20y%3D%22310%22%20as%3D%22sourcePoint%22%2F%3E%3CmxPoint%20x%3D%22290%22%20y%3D%22310%22%20as%3D%22targetPoint%22%2F%3E%3CArray%20as%3D%22points%22%2F%3E%3CmxPoint%20as%3D%22offset%22%2F%3E%3C%2FmxGeometry%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%2211%22%20value%3D%22Peripherals%22%20style%3D%22rounded%3D0%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3BfillColor%3D%23fff2cc%3BstrokeColor%3D%23d6b656%3BgradientColor%3D%23ffd966%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22290%22%20y%3D%22550%22%20width%3D%22100%22%20height%3D%2240%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%2212%22%20value%3D%22%26lt%3Bdiv%26gt%3B%26lt%3Bbr%26gt%3B%26lt%3B%2Fdiv%26gt%3B%22%20style%3D%22shape%3DflexArrow%3BendArrow%3Dnone%3BstartArrow%3Dnone%3Bhtml%3D1%3Brounded%3D0%3BstartFill%3D0%3BendFill%3D0%3BstrokeWidth%3D1%3BfillColor%3D%23f5f5f5%3BgradientColor%3D%23b3b3b3%3BstrokeColor%3D%23666666%3B%22%20edge%3D%221%22%20source%3D%2211%22%20target%3D%228%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22-0.1429%22%20y%3D%2220%22%20width%3D%22100%22%20height%3D%22100%22%20relative%3D%221%22%20as%3D%22geometry%22%3E%3CmxPoint%20x%3D%22350%22%20y%3D%22410%22%20as%3D%22sourcePoint%22%2F%3E%3CmxPoint%20x%3D%22350%22%20y%3D%22480%22%20as%3D%22targetPoint%22%2F%3E%3CArray%20as%3D%22points%22%2F%3E%3CmxPoint%20as%3D%22offset%22%2F%3E%3C%2FmxGeometry%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%2213%22%20value%3D%22NVidia%20GPU%20in%20slot%26lt%3Bdiv%26gt%3B4e%20PCI%20lanes%26lt%3B%2Fdiv%26gt%3B%22%20style%3D%22rounded%3D1%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3BverticalAlign%3Dmiddle%3BfillColor%3D%23f8cecc%3BgradientColor%3D%23ea6b66%3BstrokeColor%3D%23b85450%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22470%22%20y%3D%22430%22%20width%3D%22120%22%20height%3D%2260%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3C%2Froot%3E%3C%2FmxGraphModel%3E











Comments

Popular posts from this blog

Installing the RNDIS driver on Windows 11 to use USB Raspberry Pi as network attached

Understanding your WSL2 RAM and swap - Changing the default 50%-25%

Almost PaaS Document Parsing with Tika and AWS Elastic Beanstalk