New MacBook Air, Dell XPS 13, or $600 Off with SANS Online Training for a limited time!

Thought Leaders

Table of Contents

Bill Worley, Chief Technology Officer, Secure64 Software Corporation

Stephen Northcutt - December 9th, 2008

At larger conferences, the SANS Institute has a vendor show, and I like to attend to find out about new companies and new technology. There was a vendor at our last show in Las Vegas, Secure 64. I had never heard of them, so I wandered over and we had a great chat. They are a DNSSEC vendor who sells a product based on the HP Itanium architecture. The more they talked, the more I learned about an incredible guy, a security thought leader named Bill Worley, so please let me introduce you to Bill.
Bill started programming in 1959; he started as a math and physics student and ended up in computers. Today, Dr. William (Bill) Worley Jr. is the CTO of Secure64 Software Corporation. He is a retired HP Fellow (Chief Scientist and Distinguished Contributor), and served as a Commissioner of Colorado Governor's Science and Technology Commission. He received an MS (Physics) and MS (Information Science) from the University of Chicago, and a PhD (Computer Science) from Cornell University. However, Bill is, at heart, a system architect.

That certainly counts for street credibility, Bill; what is your background? What led you to focus on the Itanium architecture?

As I said, computer architecture is a my primary interest. In the mid 1960’s, while a graduate student and consultant to IBM Research and SRA, I was a principal, together with two Stanford graduate students, in architecting a novel 16-bit system. The system was actually built by IBM, and we tried to convince IBM to market small business applications on this system. But IBM’s market research then concluded that there was no market for such a "personal computer." I later worked for IBM for 13 years on mainframe, storage, and other advanced architectures. A particular privilege was working in the mid 1970’s with Dr. John Cocke and the IBM Yorktown Heights Research team on the 801 architecture, the first of the RISC architectures.

OK, and according to Wikipedia, "the acronym RISC (pronounced risk), for reduced instruction set computing, represents a CPU design strategy emphasizing the insight that simplified instructions which 'do less' may still provide for higher performance if this simplicity can be utilized to make instructions execute very quickly." So, you got as far at IBM as you could with architecture and decided to move on; what happened next?

Just an added point. RISC architectures derive their advantage not only from enabling higher operating clock frequencies, but also from the fact that the most frequently executed machine instructions, even in more complex architectures, are the simplest ones. RISC architectures, therefore, made it possible to build a smaller, simpler processor entirely in hardware, to integrate the whole processor on a single silicon chip, and to deliver superior price and performance.

I left IBM and joined HP Labs at the end of 1981. My first job was to lead the development of HP’s RISC architecture, later christened PA-RISC. This architecture offered advances in RISC architecture, described in papers published at the time, and became an architecture of convergence for HP. We also discovered how to move applications written for the earlier HP-3000 architecture, a very successful 16-bit stack oriented architecture, to PA-RISC by translating the HP-3000 binary programs. DEC later employed this technique successfully to migrate applications from their VAX architecture to their Alpha RISC architecture. The earliest PA-RISC systems shipped in the mid 1980’s, and the PA-RISC product line continued to evolve.

By 1990, the question had become how could HP leap frog competitive RISC systems. HP adopted two avenues of approach. The first was to develop advanced CMOS processors that made major advances in parallelism and clock frequencies. An effort led by Denny Georg, now Secure64’s Chairman, developed the HP processor family, internally known as the "Snakes" processors. These processors set new records for performance and price/performance. The second approach was to undertake a deep study in HP Labs to understand how RISC processors would continue to evolve, what fundamental barriers they would encounter, and to explore architecture innovations that could surpass RISC architectures in the long run.

At this point in time RISC was clearly encountering limitations. RISC architectures are inherently sequential, and straightforward implementations of these architectures at best sustain an instruction execution rate of one instruction per cycle. It became axiomatic that further significant advances in performance require a processor to be able to execute multiple instructions per cycle.

In the 1960’s an IBM program known as the Advance Computing System (ACS) effort had delved deeply into these questions. This was the first effort to prove that it is possible to design a processor that can issue multiple instructions per cycle, in an order of execution that differed from the sequential, in-order execution defined by the instruction set architecture. These results had been published by 1971 in a paper by Dr. Herb Schorr, who had been the architecture director of the ACS effort.

Early in the 1990’s IBM introduced the RS-6000 RISC processor, which was a RISC processor capable of executing more than a single instruction per cycle. Similar multiple issue capabilities were included in the HP Snakes processors, and the conventional wisdom in the early 1990’s was that the future for RISC architectures would be implementations capable of executing multiple instructions per cycle, not in the prescribed order of the sequential RISC architecture definition. Such implementations became called "Out-of-Order-Super-Scalar" implementations.

Our research in HP Labs focused both upon the mathematics and the practicalities of out-of-order multiple issue processors. We also focused on alternative architectural approaches that could achieve levels of instruction parallelism higher than those attainable by out-of-order super-scalar RISC processors.

We reached several basic conclusions about out-of-order superscalar processors. First, such implementations are extremely complex, with very long execution pipelines; and the resulting concurrent execution levels were less than hoped for. Second, there are mathematical barriers that prevent such processors from ever reaching high levels of instruction parallelism. Thousands of comparators must execute and have their results correlated every instruction cycle for even modest levels of multiple instruction issue, and this hardware complexity grows quadratically. Third, the silicon area required to re-order and issue multiple instructions is huge. For the PA-8000, HP’s first 64-bit out-of-order superscalar processor, for example, the number of gates required for the re-order and issue buffer was as large as that required for the entire previous generation processor chip. And lastly, after having built an out-of-order processor, it is still necessary for a compiler to schedule the instructions meticulously to realize even modest levels of instruction execution parallelism.

Bottom line, even with a highly complex out-of-order-super-scalar processor, and significant silicon chip area devoted to out-of-order multiple instruction issue, the compiler still must work hard to schedule instructions to achieve sharply bounded levels of parallelism.

The alternative architecture approach we adopted in HP Labs was first to eliminate the bottleneck of sequential instruction encoding, and then to focus upon the problems of providing a richer set of computing resources, minimizing the effects of memory stalls and pipeline delays, reducing hardware complexity, and strengthening security capabilities.

For traditional RISC architectures, one has a compiler that can see all the opportunities for execution parallelism. However, the compiler is constrained to map the computation into a sequentially executed stream of instructions. The hardware, then, must dynamically re-examine a relatively small excerpt of the sequential stream of instructions to rediscover opportunities to execute instructions in parallel or in a better order. Clearly the sequential instruction schedule is a bottleneck. A more promising approach is to keep the hardware as simple as possible, use all of the silicon area for caches, buffers, execution units, or other hardware components that contribute directly to computational throughput, and permit the compiler to produce a parallel instruction schedule. In other words, encode concurrent instructions simply and make the instruction level parallelism explicit. This approach became known as EPIC, explicitly parallel instruction set computing. The initial architecture defined by the HP Labs team was called PA-WW (PA Wide Word).

Even with a much richer set of registers and functional units the silicon area for such a system will be smaller than that required for an out-of-order RISC implementation in the same silicon technology. In fact, the active computing core of the first Itanium2 chip was about half the size of the active computing core of the comparable Intel X86 chip.

The difficult problems for an EPIC processor implementation were minimizing the effects of memory latencies and instruction execution pipeline disruptions. These problems were addressed by several innovations described in the published papers about the architecture. Basically, greater on-chip areas are available for caches and more execution units, more explicit control is provided for caching and initiating memory operations speculatively and earlier in the execution stream, execution pipelines are far shorter, and predication enables many conditional operations to be performed without using branch instructions - avoiding execution pipeline delays.

The architecture also was enriched by automatic means for managing the large register sets, compactly scheduling software pipelined computations by explicit register renaming, dynamically controlling fine-grained access to sets of pages in memory, and storing critical control information in areas of memory inaccessible to executing software. These latter capabilities significantly enhanced the architecture’s capability to support a highly secure system.

By late 1993, HP had decided to proceed with EPIC architecture and approached Intel to see if they would like to partner. It didn't make economic sense for HP to try to get into the silicon business or to launch a new proprietary architecture. Technical and business study groups from both corporations came together and soon saw the advantages to a partnership. That is how the Itanium effort began, and how Intel and HP began working together.

There were growing pains during the initial system roll outs, but today, Intel Itanium processors are the foundation for servers for the huge Enterprise mission-critical server market, According to a mid 2008 report by IDC "Itanium is the fastest growing server platform in the world." On October 29, 2008, HP announced major successes in migrating European customers from legacy mainframe systems to HP Integrity (Itanium based) servers, achieving data center cost savings of "up to 70 percent," and that many similar migrations are expected in the "next 12 months.". According to IDC, HP "tied for the top position in revenue for the Europe, Middle East, and Africa Enterprise server market, securing 32.7 percent market share as measured in the second quarter of 2008."[1] The article also cited "Rapid adoption of IntelŪ ItaniumŪ processor-based platforms" and "HP Integrity Innovations continue to grow."

So Bill, as I understand it, when you retired, you headed into your workshop to create a new operating system, what was your goal?

One of the HP Labs goals was to build into the EPIC architecture capabilities that would make the system more secure. For example, one can store critical control information in a memory area that is protected by "Protection Key". A protection key permits one to tag a page of memory with a 24 bit identifier and, unless a processor register contains a matching identifier, executing code cannot access that memory - no matter what its privilege level.

Even while at HP I advocated work to create operating systems that have a secure architecture by fully utilizing unique Itanium hardware capabilities. After retiring I met some folks here in Denver who also were interested in developing systems that had unprecedented security and performance. This is what we’ve done at Secure64. Our system objectives were to make the system and its applications immune to compromise from malware and rootkits, and resistant to network attacks. We call such a system "Genuinely Secure."

A genuinely secure system is quite unlike a hardened OS that is configured to minimize its attack surface and its exposures to vulnerabilities. To be honest with you, I believe that we never are going to be able to achieve required security levels in complex general purpose operating systems such as Microsoft Windows, UNIX, and Linux. The fundamental problem with these operating systems is they are monstrously complex, having added layers and layers of abstraction to a fundamentally weak foundation running on a weak hardware protection structure - these systems don’t even use all of the protection capabilities offered in the underlying hardware architectures. Portability and backwards compatibility have been major constraints, and eventually the inertia of the total investment militates against evolving to a more solid foundation. Eventually, after one creates so many layers of abstraction, no one understands anymore what actually is going on.

What we have today is a continuing cycle of vulnerability discovery, exploitation, and patching, both in operating systems and in applications. Most systems cannot survive when directly connected to the Internet. They must be surrounded by a bodyguard of protective devices such as firewalls, intrusion detection systems, flood protection systems, etc. each themselves built on similar foundations and with their own vulnerabilities. This is a rich target of attack for vandals, criminals, and hostile nation states. I believe the infusion of virtualization into this situation will be found actually to increase, rather than reduce system attack surfaces. Chris Whitener of HP, for example, has pointed out that snapshots of virtual machines, taken to suspend operation or migrate the virtual machine to another system, offer a rich harvest of sensitive material stored in the clear, and ripe for attack.

So you focus on the security capabilities of the Itanium architecture, what are some of the things you do?

For a long time I have believed it is possible to design build a system with extremely strong security properties, particularly when one can design such a system to run on a hardware architecture that supports advanced capabilities for security. We often have been asked why we run on Itanium. The simple answer is that Itanium has unique properties that are essential for a genuinely secure system. We have white papers on our website explaining these matters.

At Secure64, we took a step back, and built the entire system, from the ground up, for security, simplicity, and minimal complexity. We were guided by solid design principles, and believe the level of security we have achieved is unmatched. One basic principle is that every piece of code executing in our system be backed by a chain of trust. We know what code is running, that it has not been contaminated, and that it has been authenticated by cryptographic signature verification each time the system boots. Code being executed is set "execute only", which is a hardware protection setting provided in Itanium, so executable code can neither be read nor written by any other executing software. Further, no other executable code can be injected while the system is running. This clearly is not fully general, but it is sufficient for our product applications. From this minimal complexity, we gain the assurance that no malware or rootkits can be injected into the system.

To be sure, in every architecture there is always one highest privilege level where code can execute any of the machine instructions. Our guiding principle here is to minimize such code. Code at this level is limited to basic mechanisms that absolutely require privileged instruction to perform their task. We use all four privilege levels supported by the Itanium architecture, and keep the code executing at the highest privilege level to a minimum. Rather than having tens of thousands, or millions of lines of source code that generate instructions that execute at the highest privilege level, we have only a few thousand source lines. It is comprehensible by a single person, and we intend at some point to publish this source code for expert and peer review.

OK, I understand, secure and simple, so that is why you called it a "Micro Operating System"?

We call our SourceTŪ a micro operating system, since it does control the system and is so much smaller than anything else out there. We tried to explain it to people without using the term "operating system," but they were not understanding it. After presenting our system without using the term "operating system," we often were asked if it "ran on Windows or Linux?" When we began using the "micro operating system" term, folks got the idea, so it enabled more effective communication. A major goal of design was to completely eliminate malware. The system also is designed to protect itself from all the types of attacks that currently are launched via the Internet - without using firewalls, IDS’s, IPS’s, or the like. The idea was to change the game so that none of the current remote attacks work anymore.

I did my homework, and found the security of your micro OS was evaluated by Matasano, here is their executive summary, would you like to make any additional comments?

Matasano was tasked by Secure64 and Intel to evaluate critically the claim that, for remote attackers, SourceT is "immune to all forms of malware, including rootkits, Trojans, viruses and worms." To do this, Matasano evaluated the SourceT architecture against three areas of vulnerability: code injection, privilege level escalation, and alteration or subversion of the trusted boot process. These areas were selected as they comprise the strategy of typical malware such as worms, spyware or trojan horse applications, to introduce arbitrary code into a computing system. Their report is on our website.

According to an article I read on the Register,
"Despite HP's healthy rise in Itanium server sales, IBM think its major rival and Intel will have to give up on the processor in the near future due to basic economics. "The end of life for Itanium will occur in the next five years," IBM VP Scott Handy told us, during an interview here in Austin, Texas. "(HP) will have to announce some kind of transition." [Itanium] Will any of these security innovations matter if Itanium is not a success in the marketplace? I am told that Beta was superior to VHS and that just did not appear to matter.

Presently no other hardware architecture can offer the security capabilities of Itanium. Many of the techniques we employ could not function on current mainframe, CISC, or RISC architectures. We believe that some of the present cyber security problems can only be met by employing security capabilities of this type, and have made such suggestions to various working groups on National cyber security.

I believe, though, that the IBM prognostication is wishful thinking on their part. I know the Register and other trade press have been predicting the demise of Itanium since the late 1990’s. But Itanium systems are stronger than ever, have set numerous world records for performance and price/performance, and today continue to increase their penetration of the Enterprise mission-critical server market. It is hard to believe that Intel, HP, Microsoft, and other manufacturers of Itanium-based systems will decide to abandon this lucrative market. It is also of note that HP supports a very broad range of hardware and operating systems on Itanium - not only HP-UX, Windows, and Linux, but also VMS, and the 99.99999 % available Non-Stop system, together with virtualization offerings.

A further point is that because Itanium is so highly parallel it is unsurpassed as a micro-emulator of earlier mainframe and RISC architectures. Intel at one of its IDF forums demonstrated execution of SPARC binaries on Itanium at a performance exceeding that of then extant SPARC processors. X86 and PA-RISC binaries also execute at competitive speeds in this manner. Recently IBM bought a startup named "Platform Solutions" that had shown that IBM mainframe system and application software ran at dangerously competitive performance levels on Itanium. Perhaps your IBM spokesperson can explain why IBM chose not to compete with them in the open marketplace.

Thank you for that Bill. Now that we have a secure operating system as a foundation, you need an application to put on it, why did you find DNSSEC interesting?

There is broad agreement that a secure DNS system is essential to the Nation’s, and to the world’s, cyber infrastructure. This fall, the Black Hat conference paper by Dan Kaminski, demonstrating a major vulnerability in DNS caching servers and elaborating the manifold types of cyber attacks that can result from this vulnerability, raised the world-wide consciousness of the criticality of the security of the DNS system. At this point in time there is general agreement that the DNS security extension, DNSSEC, constitutes the only complete solution to eradicating these problems. Ron Aitchison, a recognized DNS expert and author[2] also points out another, perhaps not as widely publicized but important, advantage of DNSSEC - that SSL for the first time becomes totally trustworthy.

DNSSEC has been defined for a considerable time. Its deployment has been delayed because of high costs and complexity. NIST has published a 103 page deployment guide, elaborating the procedures and best practices needed to implement it. We have been working for some time with Professor Dan Massey, CSU, one of the authors of the DNS security extensions[3]. The basic problem with DNSSEC is that the authors knew the systems hosting DNS application software were so vulnerable that they could not be trusted to keep the signing keys on line. They had to manage the keys and do the signing on systems that were offline. This is what led to the high costs and complexity. While clumsy in all situations, it simply does not work in others. There are firms, for example, that dynamically create tens of new DNS entries per second. There is no effective way to transport these dynamic DNS entries offline and back to get them signed in a timely manner.

However, since Secure64’s SourceT micro operating system is secure from the ground up, we can keep the keys online, and can protect them with an Itanium memory protection key. Every plaintext key use is kept in protected memory. Any key that is managed by the system but not in protected memory is encrypted. When needed, we have a TPM chip as the platform root of trust, that we also employ when transporting root keys into protected memory.

Other than yourself, do you know of anyone making use of Trusted Privacy Module (TPM), the tamper resistant chip to securely store keys? It seems like it has been a long time coming? Most of us have TPM chips in our laptops and desktops, and, yet, they are not used for anything. Since MS Vista was not a commercial success, we never got to see if BitLocker would get industry traction. What can you tell us about TPM?

I am aware of a number of efforts in industry and government that use or are planning to use a TPM. The TPM chip is a fundamental component of Intel’s published security roadmap.[4] Many uses of the TPM and a software stack built on top of it were specified over several years without actually being implemented. There are lots of specifications, and efforts now are beginning to implement elements of these specifications. The hardware platform we presently use has limited firmware support for the TPM, and our use of the TPM is very basic. We use it only as a source of entropy and as the root of a platform-unique tree of keys. This involves only the two required TPM keys and a third key generated as the root of our key tree.

I understand that uses of the TPM are farther advanced in other platforms, but there seem to be persistent tactical and logistics problems that remain to be solved before the full use envisioned for the TPM becomes a reality.

What is the fundamental problem with DNS Security?

Well, Stephen, today there is no way to know for certain that when you:
  1. Send a DNS request to look up an IP address, or set of IP addresses, for a URL, (or)
  2. Get back a DNS response,
that the returned IP address is the actual IP address of the system you wish to reach.

The security extensions to DNS build chains of trust so you get an both an IP address, or set of IP addresses, and a digital signature that authenticates the IP address(es).

Once we can implement this globally, then SSL actually becomes trustworthy. Currently, there is the risk of a man in the middle attack. Once you have the security extensions, that can no longer happen.

OK, you built a better mousetrap, how is it selling?

We've received very high levels of interest in this product. The US government has made decisions to sign important zones by December 2009. There is also high interest in Europe. We have talked with some financial institutions whose auditors have concluded that they must implement DNSSEC, and we have a product that makes this practical and reasonable.

What do you think is the next great application for this technology?

The DNS system is a global reliable, available, redundant distributed database. Once you add security, you also have the component of trust. All of the information in the system can be supported by digital signatures. What other things might one keep in such a database? We view the Secure DNS structure as a building block for other applications, and for solving other problems.

What do you think about the micro OS and TPM as a security solution for UDDI for SOA?

Actually, back in the early 90's, at HP Labs we postulated various ideas of software as a service. HP chose not to pursue these ideas in order not to offend major partner companies. However, the basic ideas all postulated highly secure and trustworthy system elements as essential components. We concluded that pervasive, highly-distributed and highly-trusted networked systems would flourish if there were strongly secured architectures in core elements.

Sometimes one must rethink problems, and fresh starts at the foundation sometimes cannot be avoided. In my view, the US needs to rebuild much of our national and industrial cyber infrastructure from the very foundation - with security as the overarching objective. Only the functions needed, no more. I think this has got to happen eventually, and I honestly don’t think we can reach such a point by evolving today’s popular and widely deployed systems.

I see continuing concerns and fear in trusting too much to today’s networks and systems. There is mounting evidence for concern nearly every day. We see White House computers hacked, as well as cyber problems for both presidential campaigns. We need to undertake the research to rethink the foundations.

Thank you so much Bill; can we ask you to share just a bit about yourself, what do you like to do when you are not behind a computer?

I enjoy playing the piano. I have been studying and playing since I was seven years old, and I still take lessons from a fine musician in Colorado. I enjoy reading, studying new languages, and skiing (although my back is no longer in the best shape anymore). I also enjoy spending time in the mountains and time with my kids and grandkids.

1 Wall Street Journal Digital Network, Market Watch, Oct 29, 2008
2 Ron Aitchison, "Pro DNS and BIND," Apress, 2005
3 SP800-81
4 David Grawrock, "The Intel Safer Computing Initiative," Intel Press, 2006