wnd's weblog


September 2010
Mo Tu We Th Fr Sa Su
30 31 1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 1 2 3
Categories
Archive

New and improved?

24 September 2010 09:16:50 hardware, rant

I've been planning to upgrade my home computer for the past year, and this Monday I finally did just that. My old computer's motherboard and memory were from 2005 and the CPU was from 2007. The biggest issue I had with my old computer wasn't the CPU speed, but rather the amount of memory I had. Now, I could have just bought more memory, but nowadays DDR1 modules are rather expensive. Expensive RAM was a good motivation to upgrade my system as whole.

I was quite excited when I walked home with my shiny new CPU (and heatsink), motherboard and 8 GiB of memory. For those interested, the motherboard is Asus M4A87TD EVO and the product code for of memory modules is OCZ3G1600LV4GK. I grounded myself with an antistatic wrist strap and removed my old motherboard from the case. After dusting everything with pressurised air I was ready to assemble my new toy. I took plenty of time to be extra careful with the installation of the CPU, massive heatsink and the memory. When I was done, the thing happily started when I flipped the switch. I closed the case and thought everything was great. Then the problems begun.

After experiencing strange problems ranging from illegal ops to applications suddenly refusing to start I realised something was wrong. memtest86+ soon explained what was going on. After three days of testing, it looks like that regardless of which combination of memory modules I put in which memory slot, I get errors. Period. Now I spend my days running userspace memory test (memtester) and nights running memtest86+.

When I first run memtest86+ with all four modules installed, I got a lot of errors. The first night memtest86+ managed to complete six full rounds, and encounter 400 errors. I also noticed the memory was running 1066 MHz, CAS 8-8-8-20 and 1.3 volts instead of manufacturer specified 1600 MHz, CAS 8-8-8-24, 1.65 V. After searching the internet I changed memory settings from BIOS default values to figures provided by the manufacturer and explained in more detail by a member of their support staff. I reseated the sticks and run two rounds of the failing test (and not the entire test suite) for each memory individually. These runs passed fine.

Using matching pair of memory modules I populated the motherboard with two sticks of RAM. Running out of daylight I only completed one round of memtest86+, and receiving no errors I booted back to Linux. I fired up my IRC client, video player, and memtester. After two thirds of my daily dose of digital entertainment mplayer quit and refused to restart. I tried to boot to memtest86+, but the thing would not shut down. Using physical methods I forced the system down and had memtest86+ run over the night.

Next morning I was greeted by screenful of bright red error messages. However, the number of errors had dropped to 40. I replaced the memory modules with another pair and booted to Linux. I booted to Linux so that I could still use my computer from work while watching the output of memtester. After passing a few rounds memtester started reporting errors. Oh great. At this point I started searching the internet for more information. It turned out that I was not the only one experiencing problems with this particular combination of motherboard and memory.

When I got home I placed the memory modules to another pair of memory slots to see if the memory slots were faulty. memtest86+ reported no errors, so after a couple of hours I booted to Linux and started memtester. Running nothing but X-Chat and links2 I had memtester to use as much memory as possible. As soon as I started memtester it reported a stuck address, whatever that is. At this point I was ready to believe everything was lost. I moved the module to another slot and started memtest86+ once again for the night.

After another six hours of running memtest86+ had located exactly one error. Again, this was an improvement over the previous results, but for some reason I was still not convinced. I moved the module to another slot, booted to Linux and went to work. A few hours later memtester reported yet another error.

Yesterday I was almost convinced that one of the memory slots on the motherboard has bad connection, but after last night's test and today's runs of memtester I think that's very unlikely. It is also quite unlikely that both pairs have a faulty module. What else could be wrong? The memory controller? Memory settings? Why didn't BIOS detect the right settings in the first place? If it's about the settings, why did the memory work even worsr with the slower settings? Are the memory modules simply incompatible with the motherboard? How can standardised components like these be incompatible?

Right now I'm a little upset to say the least. Memory issues are not new to me, but issues like this are. The last issue with memory I had was resolved by changing latency of DDR1 memory from 2.5T to 3T. After that the system run perfectly fine until my current system replaced it (or at least tried to). How can this be so difficult? I thought new hardware was faster, cheaper and more compatible. Clearly I was wrong.

Now what? My co-worker has kindly agreed to test the other pair of the memory modules on his motherboard. If that test passes, I know it's not the memory as-is. If the modules are good, then either they're not compatible with my motherboard, or worse, the motherboard is faulty. Also, a friendly person on IRC has agreed to lend me a pair of bulk Kingston DDR3 modules for the weekend. This should help in finding out whether the motherboard is fine or not.

Although the situation looks desperate now, I'm sure things will work out -- one way or the other. I hope.

Permalink | Comments (1)

Magic smoke

13 September 2010 21:24:37 hardware

My old new server from two years back let the magic smoke escape just earlier today. :-( chikan.katei.fi was churning in hundreds of thousands of lines into PostgreSQL database when disaster struck. When the process stopped, at first I thought it was yet another hickup with chikan's integrated network card, but when my automagic script didn't kick in and save the day, I got nervous.

After some checking I found out that chikan had powered down -- uncontrollably as it later turned out. When I approched chikan I smelled unmistakably smell of magic smoke. I unchained the poor thing and carried it to the operation room. I disconnected all non-critical components and switched on the PSU. The only reaction I got was a frantically blinking green LED on the motherboard. I soon came to realise that this was something that could not be solved just like that.

At first I digged out tomodachi, my old server last powered up in 2004, but after booting it up came to conclusion (thanks, zeska!) that it would be easier to throw the disks to dante. With a 2.2 GHz AMD64 chip it could be considered an upgrade over chikan's passively cooled Celeron 220. After some tweaking I managed to get dante to accept chikan's disks and boot up. Without getting into details, after two hours dante was ready to take over chikan.

Switch to dante is a two-edged sword. On the other hand, dante is significantly faster than chikan, processing database queries at least four times as fast. However, on the other hand, it does have moving parts, including fairly noisy chassis fan. Besides that, chikan was much more power efficient. Right now the question is what happens next. dante can deal with katei.fi and intra.katei.fi just fine long into foreseeable future, but somehow I don't like the idea too much.

What kind of hardware should I get next? If the three factors to consider are reliability, price and performance, I opt in for the first two. I thought I had this covered already with chikan, but obviously I was wrong. I'd love to hear your ideas on this.

Permalink | Comments (0)