Troubleshooting: How to Troubleshoot Lockups on your SBC
Not only do we support and sell a lot of different single board computers (SBCs), we also use them heavily in our day-to-day tasks at ameriDroid!
One of the benefits of this practice is that we get to experience real-world issues while doing real-world tasks.
This article will touch on lockups experienced while using Ubuntu MATE on an ODROID-N2 2GB as one of my desktop systems.
I generally use the N2 to monitor different web pages with Firefox - 4 separate Firefox windows with a total of 11 open tabs. 2 of those tabs show pages that automatically update with information throughout the day. The other 9 tabs are used to look up information and access specific sites that are needed throughout the day.
Unfortunately, every few days, the N2 would become non-responsive and would many times not recover from that non-responsiveness, requiring a power cycle. The good news is that this never caused any corruption on the eMMC card that held the OS and data. Coupled with the speedy boot time, another plus is that Firefox would always recover the tabs after the reboot, so it didn't cause too much disruption. But it was still annoying.
I am a reasonably good troubleshooter, seeing as how I've been troubleshooting professionally full time since at least 1994. There is a science to troubleshooting. I have never formally studied troubleshooting per se, but I have developed a method that works for me (apologies to anyone out there who has a PhD in troubleshooting!). Here are my steps:
- State the problem: It's important to know what you're trying to fix before you try to fix it.
- Identify the systems that could be responsible for the problem: This requires a deep understanding of the systems and their functions, but even a beginner is better served by attempting this vital step. Be sure to make a list of everything that could possibly be responsible for the problem.
- Rate each potentially faulty system: Give each potential system two ratings based on your best guess / judgement: How difficult is it to test the system, and how likely is it that the system is the one causing the problem?
- Develop and implement a test to rule out each potentially faulty system: Start with the least difficult tests for the most likely systems to be at fault, and work your way toward the more difficult and least likely systems. Or go with your gut - experience will give you a good shot at knowing what is the cause of the problem.
- Repair / replace the faulty system once it has been identified
What can make troubleshooting many times more difficult are these:
- When more than one system fails at the same time.
- When it is hard to repeat the fault, or when the problem is intermittent.
- A combination of the above.
2. Identify the systems that could be responsible for the problem: As the same issue happened with two systems, some issues could be likely ruled out, like hardware or power supply issues. The main commonalities between the two systems were:
- Ubuntu MATE 64-bit
- 2GB RAM
Firefox: I had at one time tried using Chromium instead of Firefox but still encountered the lockup issues occasionally, so I ruled Firefox out as the likely cause.
Ubuntu MATE 64-bit: Years ago, I had used Arch Linux on a Raspberry Pi. I thought maybe it would be good to go "old school" and set up the OS from scratch. With Arch Linux, the user basically builds their machine from a base foundation. The benefit of this is that the system only contains what the user wants it to contain. The penalty is that it takes a lot of time and thought to set up, especially if you don't do it often.
Back then (I'm not sure if it still applies), Arch could perform all updates without rebooting afterward, so it was also a very long-running option.
An alternative to starting with the low-level Arch model is to use one of the Arch distros that already contain a lot of the common features most users want. I selected Manjaro KDE Plasma for the N2 as a good starting point.
I'm a fan of dark UIs as I think they are easier on the eyes and have a certain cyberpunk aesthetic, so I absolutely love Manjaro KDE Plasma's default skin.
2GB RAM: The easiest way to test this is to open system monitor and load up the system until the RAM is consumed. How does the system behave when all the memory is used up?
3. Rate each potentially faulty system:
1 = Low, 10 = High
Firefox - Difficulty: Low, Likelihood: Medium
Ubuntu MATE 64-bit - Difficulty: Medium, Likelihood: Medium
2GB RAM - Difficulty: Medium-Low, Likelihood: Medium
4. Develop and implement a test to rule out each potentially faulty system:
Firefox: Try a different browser, like Chromium, and see if the problem still persists.
The problem continued with Chromium.
Ubuntu MATE 64-bit: Try a different OS, like Manjaro KDE Plasma, and see if the problem still persists.
The problem persisted with Manjaro KDE Plasma.
2GB RAM: Open system monitor and open programs until RAM approaches and/or hits the 2GB limit.
When RAM approached the 2GB limit, the system started acting extremely sluggish and less stable.
Virtual RAM (known by several names including "pagefile" and "swap") is a method to "trick" the system into representing some of its storage space as RAM. The downsides of this is that storage media:
- is generally much slower than physical RAM, so some actions will have a noticeable slowdown
- may have its lifespan shortened if it has limited write cycles, like microSD, eMMC and SSD
However, if the choice is between a stable system and a system with a minimally-impacted storage lifespan, I'd prefer a stable and usable system. In general, eMMC storage is quite robust, especially when compared to microSD media.
- Here's a great article for how to set up swap on Manjaro.
- Here's a great article for how to set up swap on Ubuntu.
5. Repair / replace the faulty system once it has been identified
After 11 days of heavy use, my ODROID-N2 system with 2GB RAM and 2GB Swap running Manjaro with many tabs opened / closed and other operations performed outside the browser, the N2 system is using 1.2GB of RAM and 0.87GB of Swap, with no signs of performance issues or problems of any sort!