Jan Kuper is co-founder of Qbaylogic and Joost Kauffman is a senior system engineer at Demcon Focal, both in Enschede.

24 September

Next-generation data communication using laser signals between ground stations and satellites will be at the terabit per second level. Given the high demands on data quality and processing speed, wavefront sensors and FPGAs are essential ingredients of the required communication terminals. Demcon and Qbaylogic demonstrate the potential of high-level functional FPGA programming.

Within the Tomcat project (Terabit Optical Communication Adaptive Terminal), part of the European Space Agency’s Artes Strategic Program Line Scylight, TNO is in charge of developing an optical ground station, including an optical ground terminal (Figure 1). From a satellite, a terminal will receive laser signals that are affected by atmospheric conditions such as temperature variations and turbulence, which induce deformations of the beam’s wavefront. Adaptive optics can counteract these deformations using a segmented deformable mirror in which each segment is individually actuated based on the input provided by a wavefront sensor.

Figure 1: Within the Tomcat project, an optical ground station, including an optical ground terminal, for laser satellite communication is being developed.

TNO and Demcon jointly built a wavefront sensor upgrade to the high sample rate (5 kHz) required for this application. It’s one of the laser communication instruments developed and marketed by the Enschede-based company with the Dutch FSO instruments consortium, supported by the knowledge institute. This project involved dedicated optical hardware and data-processing software, implemented in FPGAs, as 5 kHz sampling couldn’t be achieved on a PC.

The wavefront sensor comprises an array of lenslets that each focus part of the incoming signal on a subregion of camera sensor pixels. To calibrate the mirror segments, 256 two-dimensional points of gravity of the corresponding subregions of each image have to be calculated on an FPGA. The camera sends the image to the FPGA line after line, in packages of 8 pixels of 12 bits each, together with some control bits for validity and end-of-line, at a rate of 80 MHz – whereas the Tomcat specification requires a 200 MHz FPGA rate. Hence, on average, one package of pixels arrives every 2.5 cycles of the FPGA clock. The transfer of a full image takes 157 μs and the calculations have to be completed within 3 μs after the last package has arrived.

Haskell

Demcon engaged University of Twente (UT) spinoff Qbaylogic, because of its design methodology for high-level FPGA programming. The advantages are that dependencies in the various processes, for example regarding the order in which the data are received or transmitted, can be dealt with adequately and exact timing, down to the nanosecond level, can be achieved. This high-level functional methodology offers a fast design process with full control over FPGA code efficiency.

A central element in the design methodology, based on the functional programming language Haskell, is the open-source compiler Clash, the result of over 10 years of (ongoing) UT research. It translates a functional specification written in Haskell into any of the standardized hardware description languages (HDLs, such as VHDL, Verilog and System Verilog). Haskell isn’t commonly used in industry, one of the reasons being that it offers less control over CPU performance than, for example, C. However, on an FPGA, the situation is the other way around: with Haskell, a designer has more control over performance than with C/C++. The reason is that a functional Haskell specification is structural, while a C/C++ program describes behavior rather than structure. Hence, Clash generates HDL code in a structure-preserving way, whereas a high-level synthesis tool starting from C/C++ needs transformations that are hard to grasp and may generate unintended hardware with unpredictable performance.

Worthwhile aspects of the methodology include its fundamentally model-based character and the availability of adequate abstraction mechanisms, among which are higher-order functions, embedded languages and typing mechanisms. Another advantage is that cycle-accurate simulation on a functional level is possible at all design stages.

Correctness

Haskell, close to mathematics, is suitable for expressing the model of an application. In the Tomcat project, this was the basis for fast and effective communication about the precise functionality definition. In addition, such a model is executable, so that its correctness can be checked. The model then is the starting point for the design of an FPGA architecture that has the same functionality and meets the performance requirements. Note that all design steps are performed in the same language and hence are executable and testable. This increases the productivity (and the satisfaction) of the designer and greatly contributes to the correctness of the design.

A first abstraction mechanism is higher-order functions (HOFs) – “higher-order” because they take another function as an argument. HOFs express architectural patterns in which a given function is used repetitively. For example, in Tomcat, the parallel pairwise multiplication of a vector ps of pixels with a vector is of indices can be formulated using the HOF zipWith as zipWith (*) ps is (see also Figure 2). The first argument of zipWith can be any binary function, in this case (*), for multiplication. In practice, vectors and matrices tend to be huge and the straightforward modeling with HOFs may lead to architectures that either don’t fit on the FPGA at hand or may give rise to a slow clock. In such cases, modifications of HOFs are available with proven correctness by which a design can be pipelined or otherwise executed over time.

Figure 2: The architectural pattern of zipWith, together with another higher-order function, fold, which expresses an accumulative application of a binary operation on a sequence of values. The dot product of two vectors is a combination of zipWith and fold.

Embedded languages, the second abstraction mechanism, offer the possibility of hiding the underlying bit representation of certain constructs. They’re very practical for an instruction set of a processor or for the states of a state machine and are very helpful in avoiding errors. In Tomcat, a small embedded language is defined for packages of pixels arriving from the camera (Figure 3). Note that an embedded language is defined as a data type so that a function can be directly defined on constructs of such a language by using a technique called pattern matching (Figure 4).

Figure 3: The small embedded language defined in Tomcat for packages of pixels arriving from the camera. The first clause is for packages containing a Boolean for an end-of-line marker and a vector of 8 pixels (defined as 12-bit words). The second clause is needed at those clock cycles when no new package of pixels arrives. Clash has a default translation of values of such types into bit patterns, but the designer can also define a bit representation.
Figure 4: Using pattern matching, a function f can be directly defined on constructs of an embedded language. Parts of packages may be automatically extracted and given names (eol, pxls), which may then be used in the body of the function definition.

The importance of Clash’s typing mechanism, the third abstraction mechanism, can hardly be overstressed, because many, if not most, errors made in practice are typing errors. A strong type-checking mechanism catches errors in an early design stage. As part of the typing mechanism, Clash can derive the type of some component, even if the designer doesn’t indicate the type explicitly. In Tomcat, this feature was often used: when the specification wasn’t accepted by the type system and its error message was somewhat cryptic, it would help to isolate a function and ask Clash what type it ‘thinks’ the function has.

Right the first time

The methodology yielded a fully pipelined architecture, by which incoming packages of pixels are first regrouped into vectors of the size needed for the application, then multiplied in parallel by index values and finally, the results are added in a tree-shaped adding mechanism. Then follows a step of accumulating results per relevant region, after which a pipelined division operation is applied for the actual point-of-gravity computation. For each subregion, this computation is completed 70 clock cycles after its last pixel has arrived on the FPGA. At 200 MHz, this corresponds to 0.35 μs, whereas the requirement was 3 μs. The VHDL generated by Clash was adapted to the required interface and straightforwardly integrated with the VHDL for the surrounding architecture as developed in other project parts.

The functional level and abstraction mechanisms of the Clash methodology made the communication between Demcon and Qbaylogic fluent and effective since the language used for the design process is close to the language in which the application was defined originally. The resulting architecture, which satisfied the requirements, was created within the required development time; in fact, it was right the first time.

Edited by Nieke Roos