
VERSATILE SMTS SYSTEMS ENGINEER
Proven record of debugging complex issues and designing creative solutions. Goal-oriented and focused on exceptional service for customers’ unique requirements. Effective verbal and written communication, and software coding. Successful at getting customers into production rapidly, and returning manufacturing lines to operation when problems are reported. Achieve quality bottom-line results, moving customers into production rapidly and returning manufacturing lines to operation in response to problems.
System troubleshooting ● Software development
Reverse engineering ● Platform architecture
Performance analysis, profiling ● Technical documentation
TECHNICAL PROFICIENCIES
Languages, Libraries: C, C++, x86 assembly, OpenCL, OpenCV, clFFT
Platforms: Windows, Linux, DOS
Tools: Visual Studio, CodeXL, Sage EDK/SmartProbe and AMD HDT JTAG debuggers, WinDbg, Excel/VBA, IDA Pro, git, oscilloscope, logic analyzer
Additional: DDR3, PCIe, PCI, ISA, SPB, SMBus, I2C, SPI, LPC, RS-232/UART, RAID, RAS, SMI, LAPIC, IOAPIC coreboot, AMI Aptio, virtualization (AMD-V), real-time, cryptography, security, secure boot, TPM
PROFESSIONAL EXPERIENCE
Sage Electronic Engineering, Longmont, CO
SMTS Firmware Engineer
R&D team leader for coreboot-based SageBIOS products, specializing in Intel Firmware Support Package customizations and fast boot times.
ADVANCED MICRO DEVICES, Fort Collins, CO
SMTS Systems Design Engineer
Provided direct Applications Engineering support to customers/FAEs/Sales for all Embedded products through architecture reviews, application notes, and platform debug. Primary engineering representative in key AMD Embedded verticals and strategic partner relationships. Influenced feature set for future products and crafted methods for extending legacy products. Improved processes for information flow and measuring success. Lead and coached junior team members.
National Semiconductor, Longmont, CO
Software Development Manager
National Semiconductor, Longmont, CO
BIOS Engineer
International Business Machines, Raleigh, NC
BIOS Engineer
International Business Machines, Lexington, KY
Systems Assurance Engineer
<br />
A Sage specialty was providing BIOS solutions that booted to operating systems very rapidly. For a several projects, I was able to invest effort profiling the slowest segments of POST and optimize algorithms. This included both in the open-source coreboot as well as Intel’s proprietary FSP.<br />
<br />
The ECC clean process is the initialization of the DRAM system to effectively zeroed values. Omitting this step quickly causes ECC errors as values are written to (or read from) memory.<br />
<br />
Along with other CPU and core logic initialization, ECC cleaning is handled inside the FSP and is often the longest duration task. I was able to improve Intel’s algorithm and reduce the ECC clean time by 50%.<br />
<br />
Modern FSPs implement a UPD feature, or Updatable Product Data, to establish various defaults and allow the bootloader to override them. We noticed there was a lack of consistency across FSP projects, however, and there was frequently the need to change a setting for which no UPD item existed.<br />
<br />
Starting with the idea of expanding the UPD interface, I created SagePkg. The package could be quickly added to any FSP source project and provided an additional interface for the bootloader to force unique settings in the FSP. SagePkg also doubled as a method for gathering profiling information inside the FSP. This was really invaluable for optimization and achieving the fastest boot times possible.<br />
A customer ran a proprietary hypervisor on a dual core processor, with Windows on one core and an RTOS on the other. They reported a driver in the Windows guest was holding off a 1ms heartbeat to the hypervisor during a guest reboot.<br />
<br />
This problem was particularly challenging for a number of reasons, and required some crafty tactics to debug it. Any straightforward methodology also delayed the heartbeat, creating a false positive. Placing hardware breakpoints in the hypervisor had no effect because #VMEXIT clears the Global Interrupt Flag. Oh yeah, no source code was available for any software component.<br />
<br />
After some reverse engineering, I used a JTAG debugger to hack the hypervisor’s Nested Page Tables so that it was visible from within the Windows guest. This allowed me to further hack the hypervisor while keeping the cores running.<br />
<br />
The problem was analyzed to a specific location in the offending driver, but the vendor didn’t believe me, saying it was impossible! I traced 100% of the HyperTransport packets between the northbridge and the device, created an Excel spreadsheet to interpret the packets, and presented that to the vendor. They quickly located the bug in their source code and released a new driver.<br />
<br />
Customer happy — product shipped on time!
We were trying to win a socket from the competition, but our product’s hashing benchmark scores came in too low.<br />
<br />
Hashing is typically a one-time operation, for example on one particular file. However, this opportunity involved hashing a large number of files and therefore was a good candidate for running operations in parallel.<br />
<br />
I architected a method to communicate file contents to the GPU and wrote an OpenCL kernel to calculate hashes in parallel. Unfortunately I separated from AMD prior to seeing this through to a product, or learning of any design win. However, my unoptimized prototype was received with lots of excitement and had exceeded expectations.
Several years after introduction of the Geode LX processor, customers began to complain of DDR memory becoming prohibitively expensive. A small team of engineers devised a method to run DDR2 with the processor.<br />
<br />
A technical hurdle involved programming the DRAM’s Mode Registers, as this was automated in the LX memory controller. The solution involved a CPLD to assist, and a method to communicate with the device.<br />
<br />
I designed a way to talk to the CPLD on an SO-DIMM and wrote the BIOS to configure either technology that was discovered through the Serial Presence Detect. This was fully documented and rolled out to the BIOS vendors.<br />
<br />
We partnered with a memory vendor to build a DDR2 module incorporating the custom CPLD and a DDR pinout. As a result, existing designs were convertible to DDR2 via resistor changes for DDR2’s supply power and VREF.<br />
<br />
The Geode LX is still a very successful product and continues to enjoy design wins!
Our business tried to target a high-profile customer, but during evaluation the customer was nearly lost due to poor system performance. The customer had written an application to leverage clFFT.<br />
<br />
clFFT is a component of clMath, an open-sourced version of AMD’s APPML (Accelerated Parallel Processing Math Library), and uses OpenCL to calculate Fast Fourier Transforms.<br />
<br />
I dug into the customer’s source code, and used CodeXL to find opportunities for optimizing their program. I found ways to make their code more efficient and eliminate bubbles in the GPU’s workload. The customer was very happy with the performance of the AMD products, and rewarded us with the design win.<br />
I noticed a trend with a percentage of customers designing memory-down solutions.<br />
<br />
It’s common practice to replace a module’s Serial Presence Detect device with a table in the BIOS. During POST, the BIOS configures the delay timings based on values from the table instead of searching across the SMBus for SPDs.<br />
<br />
The trend was a failure to bring up designs due to errors in the SPD table. For example, a delay might be too short or the CRC incorrect. Many times a customer might attempt to copy SPD values directly from a production DIMM, but this was no a guarantee of success either.<br />
<br />
First I created a tool to decode the SPD values to help debug failing platforms. Its results could be easily compared with a device datasheet, and errors were easy to spot. This was followed up with an encoder so parameters could be entered from the datasheet, and the SPD values could be copy/pasted into the BIOS source.<br />
<br />
Excel is fairly weak on support for bitwise and hex calculations. I used VBA to do a lot of the heavy lifting.<br />
One of our top customers had a great opportunity but needed to adapt their design for Power over Ethernet. PoE products may use no more than 12.95W of power, but are allowed to negotiate up to 25W.<br />
<br />
The problem was that the existing design consumed nearly 19W before the negotiation could take place. AGESA (AMD Generic Encapsulated Software Architecture) not only turned on features, but cranked the cores up to full speed to reduce boot time.<br />
<br />
Normally this should have been a quick fix — recompile AGESA with different build options. A requirement of this project was that the AGESA binary couldn’t be touched!<br />
<br />
I designed a method to prevent AGESA from turning up the cores’ frequencies, and I was able to keep power consumption at about 11W until the 25 could be negotiated. After negotiation, the cores were allowed to run at full speed. I was happy to win an AMD Spotlight Award for this achievement.<br />
Certain Embedded customers experienced a painful transition to UEFI BIOS technology. They were subject to a requirement of maintaining a verifiably unmodified ROM image. UEFI’s specification for variables requires substantial storage, so the BIOS stores them in the flash device. (Surely this is true for all UEFI implementations; it certainly was for our vendors.)<br />
<br />
Customers began designing secondary SPI storage on their boards and connecting to a different CS#. The design of AMD controller hubs prevented this from being a simple addition, however.<br />
<br />
I delivered a complete solution though several steps. (1) I generated a reference schematic to be added to the reference system design. (2) I directed modifications in a BIOS partner’s codebase to allow reading/writing of variables POST. (3) I documented the solution in an Application Note, which discussed the tradeoffs of this solution. The secondary device cannot be accessed until the BIOS is running from DRAM, requiring certain options be fixed at build time. The application note also reflected my opinion that while this technically meets the regulatory requirements, it does not “secure” the system per the intent.<br />
The CS5536 companion chip included two 3-wire UARTs, which was adequate for most customers’ needs. However a few, including a top-tier customer, needed full compatibility in their systems. This could easily be achieved by connecting an LPC SuperIO device, but at an additional system cost.<br />
<br />
The Geode family of devices was very configurable for SMIs. I designed and implemented a solution to mimic the control signals’ (DTR, RTS, DCD, DSR, CTS and RI) behavior with GPIO pins. The software used SMI traps on the UART registers, synthesized the values read by software and acting upon the values written.<br />
<br />
The solution was documented and published as a Virtual System Architecture (VSA) module.<br />
In order to provide the highest level of service to Embedded customers, all technical support communication passed through a database. This avoided many pitfalls, such as trying to manage multiple email threads. The functionality was adequate, however required training in order to use it well.<br />
<br />
As field staff and sales reps were added at an increasing pace, it seemed the guidelines were often ignored, causing topics to get lost and quality affected. I worked with a manager and the database administrator to streamline the process.<br />
<br />
I also drove an effort to clear the HTML pages of significant regions of text and instructions, since they were clearly being ignored. The pages were rearranged to better reflect accepted eye tracking patterns and to accommodate workflow better.<br />
<br />
I also had safeguards added to remove the errors made most frequently. Depending on a user’s group membership, items and options were included or omitted from the pages.<br />
One of our longtime customers reported their products were experiencing reboots in the field. Not having access to the platform, I needed to rely heavily on the customer for this debug. The failures occurred very infrequently, and their BIOS initially cleared the Machine Check Architecture registers during POST.<br />
<br />
Once we were able to capture the MCA information, the pattern was undocumented! I pored over design documents to find how to interpret the failure mode. This issue was further complicated by the Linux kernel being much older than officially supported on this processor generation. The customer had backported patches to their kernel.<br />
<br />
I was able to find a solution to their problem that would allow them to patch the field as necessary.<br />
As designs became fan-less, we discovered there was a lot of confusion around designing and testing heat sinks. Thermal test kits are product samples which are at the high end of the thermal distribution. A customer needs to merely test with the TTK to have confidence in their thermal solution.<br />
<br />
Some product lines didn’t have TTKs available, however. In these situations, customers couldn’t know where their sample fell in the distribution. This was further complicated by availability of an AMD application, engineered to be a thermal virus and capable of driving a product above its Thermal Design Power.<br />
<br />
Frustrated by existing documentation being inadequate for educating the customer, I wrote a paper describing the nuances important to know regarding thermal design and testing. It discussed what ratings like TDP are (and are not), how to best estimate power consumption when measuring all voltage rails was impossible, and how to utilize AMD test tools to manipulate the system’s power consumption.<br />
<br />
It concluded with a description of how to interpret the AMD Tctl reading as well as throttling and self-preservation features in the products.<br />
This was a really interesting design, and I haven’t seen anything like it since. The customer’s platform would freeze due to SMBus traffic.<br />
<br />
SMBus (and I2C) is specified to allow for multiple masters on a bus — there is a procedure for detecting collisions and backing off if necessary. It is somewhat uncommon to see designs with more than one master, however. This design had an extra twist in that the bus traversed a bridge, and each master sat on opposite sides.<br />
<br />
The bridge introduced a tiny amount of propagation delay onto the bus which broke the collision detection in rare circumstances. Without access to the customer’s platform, I needed to assist them in understanding SMBus behavior and in capturing the anomaly. Once we had the right trace, I was able to tell the story of what had happened in the hardware. I went on to define a method for their proprietary driver to recognize the collision, and clear the condition from the controller.<br />
In true Embedded design fashion, some customers’ products omitted all USB ports except for the blue USB 3.0 connections. This created an ugly chicken & egg problem for installing Windows XP or Windows 7. The driver couldn’t be installed because the ports were nonfunctional, due to no driver being installed!<br />
<br />
Customers were threatening to not ship and drop the designs.<br />
<br />
I devised a method for both XP and Windows 7 that would allow a user to get around the restriction, and successfully install the operating system from scratch.<br />
AMD created a 9W, 1 GHz Sempron processor from existing Family 0Fh technology, but we found that memory performance degraded significantly at lower core frequencies. The memory clock was driven from the core clock but the divider was subject to a minimum threshold. As a result, memory was clocked slower at lower core frequencies once the divider hit its lowest setting.<br />
<br />
Slower frequencies don’t need to be a huge penalty if you understand how DRAM works. The existing setup algorithm assumed the DIMMs were clocked at their optimal ratings, however. Memory delay timings are mostly specified in units of time, and the BIOS converts this value to multiples of clock cycles.<br />
<br />
I wrote an application note and distributed source code to BIOS vendors to recalculate the optimal timings, and determine a better number of clocks for various delays. The largest benefit came from adjusting the CAS# Latency, which is typically interpreted by its “clock multiple” value, to correctly match the cycle time.<br />
I led a group of nine engineers and a technician. We spec’ed and developed two new firmware products from scratch, and ported the MediaGX features into Embedded operating systems.<br />
<br />
The team delivered code, documentation and training materials.
The monolithic Virtual System Architecture was now gone and new features were easily added to VSA2 via modules. This technology is still in use fifteen years later in the latest generation, the Geode LX processor.<br />
<br />
In addition to the project management tasks, I created training materials for VSA integration and a how-to for creating modules, and helped BIOS vendors port their codebases to the newer technology.<br />
I was the sole BIOS engineer bringing up the “Renegade” project and taking it to production. At the time, this was considered to be the most successful Compaq notebook product.
The MediaGX family of products relied heavily on SMIs to replace certain traditional hardware. Our Virtual System Architecture was a monolithic SMI handler which replaced the traditional SMI handler. The BIOS vendors needed to adapt their existing architecture to support our products. I facilitated this by hosting the engineers and helping them work through issues.
I owned the Plug and Play features in the Aptiva product line. This was in the timeframe when system resources were severely constrained in PCs. I added as much configuration as the hardware would allow; not just the traditional settings. This allowed more flexibility with the IBM products than the competition.
IBM’s reputation dictated that we maintain many legacy fixes and workarounds for poorly behaved hardware. We kept a selection of devices for performing unique testing.Additionally, I was responsible for being fully aware of Microsoft’s initiatives and WHQL requirements. I used this to help steer the feature sets of products. I also frequented the Microsoft site during WHQL certification, correcting any problems found. Other travel included PnP plugfests and Windows Hardware Engineering Conferences.<br />
The Advanced Diagnostics diskette image was a proprietary set of utilities, and was updated for each product family. This was part of the RAS checklist for the Aptiva PC.
Since my job was to break stuff, this was a great opportunity to begin learning about PC architecture. I wrote many pieces of software to test the BIOS, as well as the hardware directly.
The Systems Assurance organization acted as both watchdog and gatekeeper for product development. It was intended to provide an additional layer of confidence in the quality of the products, and was staffed with engineers to oversee the details of the development process. This covered design methodologies, commodity selection, regulatory certifications and patent submissions.
One of my designated testing roles was setting up machines in a walk-in TH chamber, running exercisers and collecting data.