Linux Support for AVR32 UC3A
Adaption of the Linux kernel and toolchain

Pål Driveklepp
Olav Morken
Gunnar Rangøy

Master of Science in Computer Science
Submission date: June 2009
Supervisor: Morten Hartmann, IDI
Co-supervisor: Håvard Skinnemoen, Atmel
Problem Description

The goal of this project is to adapt the Linux kernel and a toolchain to support the Atmel AVR32 UC3A0512 microcontroller. This involves adaptation of the GNU Compiler Collection (GCC) and associated tools, and the Linux kernel and drivers specific to the Atmel AVR32 UC3 CPU architecture. In addition, a set of useful applications should be selected, compiled and tested.

Assignment given: 15. January 2009
Supervisor: Morten Hartmann, IDI
Abstract

The use of Linux in embedded systems is steadily growing in popularity. The UC3A is a series of high performance, low power 32-bit microcontrollers aimed at several industrial and commercial applications including Programmable Logic Controllers (PLCs), instrumentation, phones, vending machines and more. The main goal of this project was to complete the adaptation of the Linux kernel, compiler and loader software, in order to enable the Linux kernel to load and run applications on this device. In addition, a set of useful applications should be picked, compiled and tested on the target platform to indicate a complete software solution.

This master’s thesis is a continuation, by the same three students, of the work of a student project during the fall of 2008. In this report we present in detail the findings, challenges, choices and solutions involved in the working process. During the course of this project, we have successfully adapted the Linux kernel, and a toolchain for generating binaries loadable by Linux. A set of test applications have been compiled and tested on the resulting platform. This project has resulted in the submission of a revised patch series for the U-Boot boot loader, one patch series for Linux, and one for the toolchain. Requirements have been created, and tests for the requirements have been carried out.
Preface

This master’s thesis documents the work done by a group of three students working on their thesis assignment during the spring of 2009 at the Department of Computer and Information Science at the Norwegian University of Science and Technology.

We would like to thank Håvard Skinnemoen at Atmel Norway for his help and guidance, and Atmel for providing the required hardware. We would also like to thank our supervisor Morten Hartmann. A special thanks goes to Øyvind Rangøy, who took the time to read and comment errors in this report.
Contents

Abstract i
Preface iii
Contents v

1 Introduction 1
  1.1 Assignment ......................................................... 1
  1.2 Project continuation ............................................. 2
  1.3 Interpretation ..................................................... 3
    1.3.1 Requirements ................................................ 4
  1.4 Structure of this report ......................................... 5

2 Background 7
  2.1 Virtual memory .................................................. 7
    2.1.1 Copy-on-write .............................................. 8
  2.2 Memory Protection Unit (MPU) .................................. 9
  2.3 Unaligned memory copy ......................................... 9
  2.4 Static Random Access Memory (SRAM) .......................... 9
  2.5 AVR32 Architecture ............................................. 11
    2.5.1 Registers ................................................... 11
    2.5.2 Instructions ................................................. 14
    2.5.3 Sub-architectures .......................................... 14
    2.5.4 Revisions ................................................... 15
    2.5.5 Execution modes ............................................ 15
    2.5.6 Exception and interrupt handling .......................... 16
  2.6 The AP7000 microcontroller .................................... 16
  2.7 The UC3A0512 microcontroller ................................. 17
    2.7.1 Logical layout .............................................. 17
    2.7.2 Features ..................................................... 19
    2.7.3 Chip revisions .............................................. 25
    2.7.4 AP7000 versus UC3A0512 .................................. 25
  2.8 EVK1100 .......................................................... 26
3 Implementation

3.1 Methodology

3.1.1 Setting goals and preliminary milestones

3.1.2 Milestone identification and implementation
3.7.4 elf2flt ........................................ 83
3.7.5 PIE support .................................... 84
3.8 SRAM optimization ................................ 84
  3.8.1 Routing of signals ............................. 85
  3.8.2 Joystick pull-up conflict ...................... 85
  3.8.3 LED resistor conflict ......................... 86
3.9 SPI chip enable ................................... 86
3.10 BusyBox ......................................... 88
3.11 Obtaining and distributing source code ......... 89
  3.11.1 Buildroot ................................... 89
  3.11.2 GCC ......................................... 89
  3.11.3 GNU Binutils ................................ 90
  3.11.4 uClibc ....................................... 90
  3.11.5 elf2flt ...................................... 90
  3.11.6 U-Boot ...................................... 90
  3.11.7 Linux ........................................ 91
  3.11.8 BusyBox ..................................... 91

4 Testing and results ................................ 93
  4.1 U-Boot .......................................... 93
    4.1.1 SPI support, requirement 1 .................. 93
    4.1.2 Loading from DataFlash or SD card, requirement 2 .. 93
    4.1.3 Patch cleanup, requirement 3 ................ 93
  4.2 Linux ........................................... 94
    4.2.1 Booting Linux kernel, requirement 4 .......... 94
    4.2.2 Running user space binaries, requirement 5 .. 95
    4.2.3 Hardware support, requirement 6 ............ 95
    4.2.4 Exceptions, requirement 7 .................... 96
    4.2.5 Code submission, requirement 8 ............. 98
  4.3 Toolchain ........................................ 98
    4.3.1 Select binary format, requirement 9 .......... 98
    4.3.2 Produce binaries, requirement 10 ............ 98
    4.3.3 Produce libraries, requirement 11 ........... 99
    4.3.4 Code submission, requirement 12 ............ 99
  4.4 Linux user space ................................ 99
    4.4.1 BusyBox, requirement 13 ..................... 99
  4.5 Patch submission feedback ....................... 99
    4.5.1 U-Boot ...................................... 99
    4.5.2 Linux ........................................ 104
    4.5.3 Toolchain .................................... 111

5 Conclusion ........................................ 113
D.13 copy_user.S for !CONFIG_NOUNALIGNED ............... 150
D.14 csum_partial: support for chips that cannot do unaligned accesses .... 152
D.15 Avoid unaligned access in uaccess.h ..................... 154
D.16 memcpy for !CONFIG_NOUNALIGNED ....................... 155
D.17 Mark AVR32B code with subarch flag ..................... 156
D.18 mm-dma-coherent.c: ifdef AVR32B code ................. 157
D.19 Disable ret_if_privileged macro ......................... 157
D.20 AVR32A-support in Kconfig ............................. 158
D.21 AVR32A address space support ...................... 158
D.22 Change maximum task size for AVR32A ................. 160
D.23 Fix __range_ok for AVR32A in uaccess.h ............. 160
D.24 Support for AVR32A entry-avr32a.S ..................... 161
D.25 Change HIMEM_START for AVR32A ..................... 170
D.26 New pt_regs layout for AVR32A ...................... 170
D.27 UC3A0512ES interrupt bug workaround ............... 171
D.28 UC3A0xxx support .................................. 172
D.29 Board support for ATEVK1100 ....................... 204

E  PDCA, SPI and DataFlash support  .................. 207

F  Toolchain patches ................................... 217
   F.1 Coverletter ....................................... 217
   F.2 GCC changes ....................................... 217
   F.3 GNU binutils changes ............................... 219
   F.4 uClibc changes .................................... 227
   F.5 Unsubmitted GCC change ............................ 232

G  Patch for elf2flt ................................ 233

H  EVK1100 SRAM expansion board ...................... 237

I  Test source code .................................. 239
   I.1 Linux exception tests .............................. 239
       I.1.1 Unaligned read ................................ 239
       I.1.2 Unaligned write ................................ 239
       I.1.3 Invalid read .................................. 240
       I.1.4 Invalid write .................................. 240
       I.1.5 Invalid opcode (aligned) ....................... 241
       I.1.6 Invalid opcode (unaligned) ..................... 241
   I.2 Toolchain tests .................................. 241
       I.2.1 Simple program ................................ 241
       I.2.2 More complex program ......................... 242
J Digital appendices

<table>
<thead>
<tr>
<th>Section</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>J.1 Linux patches</td>
<td>243</td>
</tr>
<tr>
<td>J.2 U-Boot patches</td>
<td>243</td>
</tr>
<tr>
<td>J.3 U-Boot unsubmitted changes</td>
<td>243</td>
</tr>
<tr>
<td>J.4 Toolchain patches</td>
<td>243</td>
</tr>
<tr>
<td>J.5 elf2flt changes</td>
<td>243</td>
</tr>
<tr>
<td>J.6 SPI DMA changes</td>
<td>243</td>
</tr>
<tr>
<td>J.7 Tests</td>
<td>243</td>
</tr>
</tbody>
</table>
Chapter 1

Introduction

1.1 Assignment

This master’s thesis is the continuation of an earlier project with unfinished goals. The project description was formulated by Atmel, and the requirements and goals were derived from it. In this master’s thesis, we resume the work by continuing where the previous work was suspended.

Atmel’s problem formulation, given as a an assignment proposition to us via our supervisor is quoted below. Figure 1.1 sums up the project concept in an informal illustration.

Linux kernel support for Atmel AVR32 UC3 processors

The project’s goal is to boot a Linux kernel on an Atmel AVR32 UC3A0512 microcontroller. In order to boot a Linux system, the following software requires specific adaption / porting:
• A boot loader. Das U-Boot is currently the only boot loader capable of loading AVR32 Linux.
• The Linux kernel. Obviously.
• A toolchain capable of generating "flat" binaries. On AP7, ELF binaries are used, but the Linux ELF loader does not support systems without an Memory Management Unit (MMU). The AP7 core includes an MMU, the UC3 core does not.
• Linux applications. This is the easy part, once all the other pieces are in place. But picking out a set of applications that is useful on specific UC3-based development boards is still a task that needs to be done.

The work will thus include adoption of the GNU Compiler Collection (GCC) and associated tools, boot loader, Linux kernel and driver-implementation specific to the Atmel AVR32 UC3 Central Processor Unit (CPU) architecture. All work will be completed using Atmel’s development boards and debugging tools, including ATEVK1100, JTAGICE mkII or AVRONE! All work will be covered by the GNU General Public License (GPL), as defined by the individual LICENSE and COPYRIGHTs of the projects and will be published in an open source context, through the AVR32 Community Website at http://www.avr32linux.org

A list of URLs to detailed descriptions of relevant projects were also given:
• Linux on UC3: http://avr32linux.org/twiki/bin/view/Main/LinuxOnUC3
• U-Boot bootloader: http://avr32linux.org/twiki/bin/view/Main/UBootOnUC3
• Linux kernel on UC3: http://avr32linux.org/twiki/bin/view/Main/LinuxKernelOnUC3

Håvard Skinnemoen was our contact at Atmel. He provided us with further specifications and guidance, and the necessary tools for the project.

1.2 Project continuation

This master’s thesis picks up the threads from the project done by the same three students during the fall of 2008. At the beginning of the work with this thesis, the status could be summed up as follows:

• U-Boot was able to successfully load the kernel via Ethernet or serial port.
• Patches for U-Boot had been submitted, but never revised.
• The hardware setup in the boot sequence of the Linux kernel was partly adapted.
1.3 Interpretation

- Linux booted and gave output to the serial console, but halted when trying to load the first user space program (init).

- No patches for the Linux kernel had been assembled or submitted.

- No changes had been done to the toolchain.

Because we had the same main objectives in the project during the fall of 2008, much of the background material from that report is still relevant. Applicable parts of the background chapter have been reused, and new sections have been added. Sections written for the previous project that is still used in this report, is listed in table 1.1.

<table>
<thead>
<tr>
<th>Section</th>
<th>Re-use</th>
</tr>
</thead>
<tbody>
<tr>
<td>Section 2.4 SRAM,</td>
<td>Unchanged</td>
</tr>
<tr>
<td>Section 2.3 Unaligned memory copying,</td>
<td>Unchanged</td>
</tr>
<tr>
<td>Section 2.4 SRAM,</td>
<td>Unchanged</td>
</tr>
<tr>
<td>Section 2.5 AVR32 Architecture</td>
<td>Revised</td>
</tr>
<tr>
<td>Section 2.5.3 Sub-architectures</td>
<td>Expanded</td>
</tr>
<tr>
<td>Section 2.6 AP7000</td>
<td>Unchanged</td>
</tr>
<tr>
<td>Section 2.7 The UC3A0512 microcontroller</td>
<td>Unchanged</td>
</tr>
<tr>
<td>Section 2.7.1 UC3a0512 logical layout</td>
<td>Revised</td>
</tr>
<tr>
<td>Section 2.7.2 Internal flash</td>
<td>Expanded</td>
</tr>
<tr>
<td>Section 2.7.2 EBI</td>
<td>Expanded</td>
</tr>
<tr>
<td>Section 2.7.2 SPI</td>
<td>Expanded</td>
</tr>
<tr>
<td>Section 2.7.4 AP7000 versus UC3A0512</td>
<td>Expanded</td>
</tr>
<tr>
<td>Section 2.9 JTAG</td>
<td>Unchanged</td>
</tr>
<tr>
<td>Section 2.11 Linux</td>
<td>Unchanged</td>
</tr>
<tr>
<td>Section 2.11.1 Configuration</td>
<td>New</td>
</tr>
<tr>
<td>Section 2.11.2 Tasks</td>
<td>New</td>
</tr>
<tr>
<td>Section 2.11.3 uClinux</td>
<td>Revised</td>
</tr>
<tr>
<td>Section 2.12 U-Boot</td>
<td>Reduced</td>
</tr>
<tr>
<td>Section 2.13 Toolchain</td>
<td>Heavily reworked</td>
</tr>
</tbody>
</table>

Table 1.1: Sections reused

1.3 Interpretation

The initial goals and guidelines were defined by Atmel for the preceding student project of fall 2008, but as an independent university group we were free to modify the assignment in any way we wanted as long as our supervisor would approve that the educational goals were satisfied. However, there were no conflicts between our desired goals and the goals suggested by Atmel.
CHAPTER 1. INTRODUCTION

1.3.1 Requirements

This subsection groups and lists the requirements defined for this thesis. The requirements are based on the assignment previously formulated by Atmel, the unmet requirements and suggested future work from the previous project. All of the requirements assume the use of the EVK1100 evaluation kit with the UC3A0512 microcontroller. Requirements marked with \(^1\) are unsolved by this project, and requirements marked with \(^2\) are only partly fulfilled.

Software and hardware components involved are introduced in chapter 2.

U-Boot

1. SPI support (needed if the kernel is loaded from DataFlash or SD card, and for using the LCD display)\(^1\)
2. Load Linux from DataFlash or SD memory card\(^1\)
3. Clean up patches and commit a new version

The Linux kernel

The Linux kernel should have the following features:

4. The Linux kernel must be able to boot.
   (a) Output to serial console
   (b) Initialize networking
   (c) Receive network configuration using DHCP.
   (d) Mount necessary file systems:
       i. NFS root file system.
       ii. proc file system
       iii. sysfs file system
       iv. devpts file system
       v. devshm file system
   (e) Load and execute an init application.

5. The Linux kernel must be able to run user space binaries.

6. The Linux kernel must support the most central hardware located on the EVK1100. The following hardware were identified as central and important:
   (a) Light Emitting Diodes (LEDs) to give status information (optional)
   (b) DataFlash (optional)\(^1\)
   (c) LCD display (optional)\(^1\)
   (d) SD Card (optional)\(^1\)
1.4. STRUCTURE OF THIS REPORT

(e) SPI (optional, needed for requirement 6b, 6c and 6d)\(^1\)
(f) DMA (optional, suggested by requirement 6e)\(^1\)
(g) Network adapter

7. Exceptions must be handled.\(^2\)
8. Resulting source code must be submitted to the appropriate source code maintainers.\(^2\)

**Toolchain**

The toolchain should be adapted to be capable of generating executables for the UC3A running Linux. This involves the following:

9. A suitable binary format must be selected, this could be either:
   (a) FDPIC ELF
   (b) Flat
10. GCC must be able to generate statically linked executables.
11. GCC must be able to generate dynamically linked executables and libraries.\(^1\)
12. Resulting source code must be submitted to the appropriate source code maintainers.\(^2\)

**Linux applications**

13. A shell and tools for basic file manipulation, user management and networking should be able to compile, load and run. This includes tools like ls, cp, cat, grep, find, mkdir, rm, rmdir, df, du, vi, diff, adduser, passwd, mount, less, ifconfig, telnet server, free, ps and a shell (ash/hush/msh)

1.4 Structure of this report

This chapter has introduced the assignment, continuation of the previous project, our interpretation, the scope of the project, and a formal requirements specification formulated from the assignment. Chapter 2 introduces the concepts, software and hardware components involved in the development process. The last section of chapter 2 also explores previous work relevant to this project.

Chapter 3 describes in detail the work carried out, the decisions made and the arguments for these. The requirements have been tested, and the tests and results are listed in chapter 4. This chapter also presents some of the feedback from our code submissions. In chapter 5 we conclude the project as a whole. Chapter 6 discusses further work that should be carried out in the future, either by us or others. The final chapter, chapter 7, lists our references.
Note that a list of acronyms is included as the first appendix, appendix A. Most of our patches are included in the appendices. The schematics for the expansion board, and the source code for some of the tests are also included in as appendices. A digital appendix with all submitted and unsubmitted patches for U-Boot, Linux, uClibc, GNU Compiler Collection (GCC) and GNU Binutils also accompanies this report.
Chapter 2

Background

This chapter gives an introduction to the devices, tools, hardware and software relevant to this project. It also describes the fundamental concepts necessary to understand the problems addressed.

The first four sections of this chapter introduce memory management concepts, alignment issues for memory access, and describes SRAM, the memory type of main focus in this report.

Section 2.5 to 2.9 introduces the AVR32 architecture and relevant Atmel products, including the JTAG, the UC3A0512 microcontroller and its sibling, the AP7000.

Section 2.10 introduce file formats for executables and shared libraries we have looked at. An introduction to Linux is given in section 2.11.

The generic introduction to Linux and uClinux is from the earlier project, but the technical details are written for this project. This section is followed by an introduction to the boot loader, Das U-Boot, in section 2.12.

The toolchain that is used for developing Linux for the chip is presented in section 2.13.

The next section, section 2.14 introduces BusyBox, which was used during this project.

The next section, section 2.15 give a short introduction to the networking servers used to support the board during boot and runtime.

A short introduction to open-source collaboration software and principles is given in 2.16. Git, the software system used for revision control of the source code is briefly introduced in section 2.16.1.

Finally, previous related work is presented in section 2.17.

2.1 Virtual memory

The contents of this section is based on [19]. Virtual memory is a method for abstracting memory addresses used in programs from their physical addresses. This allows each separate application to have its own private address space, which in turn can be used to enforce memory protection. This is typically implemented with a Memory Management
Unit (MMU). With a virtual memory system, two separate address spaces needs to be considered – the virtual address space, and the physical address space. The physical address space refers to the physical memory, while the virtual address space is per-application.

The MMU’s task is to translate virtual addresses to physical addresses. It works by splitting the memory area into separate pages, where each page is a fixed size. A quick survey of the Linux source code shows that typical page sizes for various architectures are 4096 and 8192 bytes.

![Figure 2.1: Simplified operation of an MMU](image)

Figure 2.1 shows the operation the MMU does when translating a virtual address. It splits the virtual address into two parts – the page number, and the offset into the page. The page number will be looked up in a page directory. The page directory contains the mapping from virtual addresses to physical addresses. The physical address retrieved from the page directory will be combined with the offset into the page to form the physical address.

The page directory contains information about each page, such as whether it is present, and what types of access is allowed to this page. For example, an application can be allowed to read from a page, but not write to it. Invalid accesses to the page will trigger an exception that the operating system can handle.

### 2.1.1 Copy-on-write

Copy-on-write is a method for saving memory by sharing equal pages between different applications. When two applications load the same part of a file into memory, they can be shared until one of the applications tries to modify it. This is implemented by the operating system by marking the page as read-only when the sharing begins. When one of the applications writes to a read-only page, it will be copied, and the data will be
written to the new copy of the page. Since much of the memory contains code that is never written to, copy-on-write can save a significant amount of memory.

2.2 Memory Protection Unit (MPU)

Without an MMU, all applications must share the same physical address space. If one application is flawed or malicious, the application may read from or write to any memory location. By doing this, the application could potentially sabotage or access any information about the kernel or any process. An MPU[5] provides a way of protecting the processes from each other by having dedicated hardware checking the address of every memory access. The MPU is usually configured by setting up a number of allowed memory areas, and an exception is generated if the application attempts to access memory outside these areas.

Usually, because MPUs are implemented in hardware, only a limited set of allowed/disallowed memory areas can be configured simultaneously. To work around this, it is possible trap the exception, and replace an old memory area with a new new memory area if a memory area not listed in the MPU is accessed. This allows an operating system to support a more or less unlimited set of memory areas.

2.3 Unaligned memory copy

The way processors copy blocks of data from one position in memory to another is vital for performance, and is handled differently depending on the architecture. Some processors can only read and write whole words (32 bits) if they are aligned on word boundaries. Others have optimized hardware instructions for unaligned accesses. Figure 2.2 shows an example of how 10 bytes can be copied between unaligned addresses. Processors that can not perform unaligned accesses must copy these 10 bytes one at a time. If a processor supports halfword copying, the data can be copied one halfword at the time, if both the source and destination address are even or odd. When a processor has support for unaligned accesses, the usual approach for the software is to copy single bytes or halfwords until either the source or the destination are aligned.

2.4 Static Random Access Memory (SRAM)

SRAM, often just called static memory, has a relatively simple memory interface. It consists of \( n_a \) address lines, \( n_d \) data lines and three control signals. The control lines are a chip enable signal, a read signal and a write signal.

The number of data lines is usually either 8, 16 or 32. It is possible to connect two SRAM chips in parallel to double the number of data bits. For example, by connecting two 8-bit SRAM chips so that they share all lines except the data lines, they will behave like a single 16-bit SRAM chip. See appendix H for an example of this setup.
When the chip enable signal is asserted, a read can be done by placing the address on the address bus, and then setting the read signal. The SRAM chip will then place the requested data on the data lines. Similarly a write can be done by placing the data on the data lines, and the address on the address lines, and then setting the write signal. A read operation is shown in figure 2.3.

Reads and writes with SRAM chips are not instantaneous, but require some time to complete. Each SRAM chip has its own specific timing requirements. The requirements define the relationship between the different signals to the SRAM chip. These requirements can for example say that the read signal must be set for at least 7 ns before data will be valid.
2.5 AVR32 Architecture

The contents of this section regarding AVR32 is based on [5], unless otherwise stated.

The AVR32 architecture is a 32 bit load/store RISC architecture by Atmel, designed with emphasis on low power consumption. AVR32 is not binary compatible with 8/16 bit AVR microcontrollers. It was first launched in 2006 with the AVR32 AP core.

The AVR32 architecture defines an optional Java extension module. This module is not available on the microcontroller used during this project, and will therefore not be discussed any further.

2.5.1 Registers

The AVR32 architecture has 16 registers, shown in figure 2.4, with 13 of these being purely general purpose. The remaining three are the program counter, the stack pointer and the link register. The link register is used to hold the return address of the current function. This reduces the amount of stack accesses required for function calls, since simple function calls do not need to access the stack at all. Both the stack pointer and the link register can also be used as general purpose registers.

An interesting feature of the architecture is that all instructions that accept register operands can take any register. This includes the program counter, the link register or the stack pointer. This means that a jump can be implemented in the following way:

\[
\begin{align*}
1 & \text{lsl r10, 2} \\
2 & \text{add pc, pc, r10}
\end{align*}
\]

What this code does is: \( pc = pc + r10 \times 4 \)
System registers

In addition to the normal registers there are a large number of system registers. Most of these are used for accessing the configuration and status of various features on the processor. Exception vectors, MMU and MPU are examples of features that can be configured with these registers.

One of the system registers is the status register. This register is shown in figure 2.5. It is split into two parts – the upper and lower halfword. User applications can only access the lower halfword.

The lower halfword contains several flags set by results of arithmetic and logical operations, such as a zero flag, an overflow flag, and several others. These flags are used by conditional branches and operations. The lock-bit is used to implement atomic operations, the scratch bit can be used for any purpose by applications, and the register remap flag is used by the Java extension module.

The upper halfword contains the status of the processor. Among other things this includes the current execution mode and whether interrupts and exceptions are enabled.
2.5. AVR32 ARCHITECTURE

Register shadowing

Another feature of the AVR32 architecture is register shadowing. When the processor changes to an interrupt execution mode (see 2.5.5), it may replace some part of the register file with one reserved for that mode. Also, whenever the CPU changes from application mode, the user-mode stack-pointer is replaced with a system stack pointer.

There are three levels of shadowing: small, half and full. In mode small, no general purpose registers are shadowed. In mode half, registers r8 to r12 and the link register are shadowed. With full, registers r0 to r12 and the link register are shadowed. This is illustrated in figure 2.6.

![Register shadowing in the AVR32 architecture](image)

Figure 2.6: The AVR32 status register

Figure 2.6: Register shadowing in the AVR32 architecture
This feature makes it possible to handle some interrupts without having to access memory. If the registers are not shadowed, they must be saved to the stack before being used. If this is not done, the interrupt handler may overwrite or change registers in use by a running application. This may then lead to incorrect execution of the application.

2.5.2 Instructions

The AVR32 architecture specification defines 214 instructions. Each instruction in the AVR32 architecture is either two or four bytes wide. Many instructions have multiple different encodings. For example, some instructions have a two-byte encoding for small immediate values, and a four-byte encoding for larger immediate values. Also, some instructions may take one or two operands. For example, the ADD instruction has a two-byte variant with \( \text{Rd} = \text{Rd} + \text{Rs} \), and a four-byte variant with \( \text{Rd} = \text{Ra} + (\text{Rb} \ll \text{shift}) \).

2.5.3 Sub-architectures

There are two different sub-architectures of the AVR32 architecture: AVR32A and AVR32B. Figure 2.7 shows the structure of the AVR32 sub-architecture hierarchy, and the UC3A0512 and AP7000 microcontrollers are shown as examples of implementations. AVR32A targets cost sensitive, lower-end applications and AVR32B targets applications where low interrupt latency is important. Since the AVR32A architecture is simpler than the AVR32B architecture, the hardware implementation of it is simpler, and lower power consumption is achievable.

![Figure 2.7: Relationships between the AVR32 architectures and implementations](image-url)

The main difference between AVR32A and AVR32B is the method of interrupt and exception handling. The difference is in the way the state is saved and restored when control is transferred to and from the exception and interrupt handlers. This topic will be discussed in more detail in section 2.5.6. Because the AVR32B architecture is
2.5. AVR32 ARCHITECTURE

Focused on interrupt latency, dedicated registers are implemented for holding the status register and return address for interrupts, exceptions and supervisor calls. The AVR32A architecture also does not implement register shadowing of any registers except for the stack pointer.

Three properties that are not defined by the sub-architecture, are the presence of cache and MMU, and the ability to perform unaligned memory access. These properties are defined by the implementation of the CPU core. Figure 2.8 shows how these three properties can vary and potentially be implemented in eight different combinations. As depicted in the figure, AP7 implements all of these features, and the UC3 none. This difference may be significant in the adaptation of the Linux kernel. For a comparison of the AP7 and UC3 implementations AP7000 and UC3A0512, see section 2.7.4.

2.5.4 Revisions

There are two revisions of the AVR32 instruction set, revision 1 and revision 2. Revision 2 introduces 15 new instructions. One of these instructions load an immediate value into the upper halfword of a register. The others are for conditional operations, such as conditional loads and stores and conditional add, sub, and such instructions.

2.5.5 Execution modes

The AVR32 architecture defines eight different execution modes. These are:

- Application mode
- Supervisor mode
• Four interrupt levels
• Exception mode
• Non-maskable interrupt

The mode can be changed by executing certain instructions, or as a result of signals from events occurring outside the CPU core (interrupts, exceptions etc). Each mode has a designated priority that determines whether execution can be interrupted to switch to another mode. In other words, execution in a mode with a certain priority will be interrupted if an event occurs that is handled in a mode with higher priority.

When running in application mode, the processor restricts access to various system registers, and the top half of the status register. This makes it possible to prevent applications from tampering with the CPU state and the execution of the kernel.

2.5.6 Exception and interrupt handling

There are two sources of “breaks” in the instruction flow. These are interrupts and exceptions. Interrupts are typically external events, while exceptions are internal events.

The AVR32 architecture implements exception handling by jumping to specific addresses when an exception occurs. The base address of the exceptions is configurable through a system register. There are four bytes between most exceptions, which is big enough for a jump instruction. Some of the more performance critical exceptions have more space between them. These are those that deal with updating the Translation Lookaside Buffer (TLB) on systems with MMU.

Interrupts in the AVR32 architecture are handled mostly in the same way as exceptions. However, the jump offset is configurable. The jump offset for each interrupt group can be set to an offset based on the exception vector.

The process for handling interrupts or exceptions varies between the AVR32A and AVR32B architecture. The AVR32B architecture has extra system registers for saving the return address and status register for each execution mode. The AVR32A architecture on the other hand does not have those extra registers. Instead, they are pushed onto the stack.

2.6 The AP7000 microcontroller

The AP7000, released in 2006, was the first microcontroller that implemented the AVR32 architecture. It was based on a new CPU core, named AP7. “AP” stands for Application Processor, and the microcontroller was meant for network and multimedia applications.

The AP7000 microcontroller runs at up to 150 MHz, and can execute 210 Dhrystone MIPS at that speed. The AP7 core implements a 7-stage pipeline with three subpipes – the multiply, the execute and the data pipe. Instructions are issued in-order, but can be completed out-of-order.

The AP7 core implements the more complex of the two AVR32 sub-architectures – the AVR32B architecture. It has separate instruction and data caches, each of them 16 KB. It also has support for a MMU, with a 32-entry TLB.
In addition to the CPU core, this microcontroller has a number of on-chip peripherals, including serial ports, Ethernet Media Access Controller (MAC)s, USB, and several others. U-Boot and Linux has been ported to this microcontroller previously, more about this in section 2.17.

2.7 The UC3A0512 microcontroller

This section is based on [8]. The AT32UC3A0512 (figure 2.9) microcontroller is part of a new product line released by Atmel in 2007. These microcontrollers were based on a new core – the UC3 core. The UC3 core is the first CPU core based on the AVR32A architecture. It is used in two series of microcontrollers – the UC3A series and the UC3B series. The main differences between the two series of microcontrollers are what on-chip peripherals are available. The UC3A series is the most feature-rich of the two series[6], described as “Communication Family”. The UC3B series lacks three features that are found in the UC3A series. This is the external memory interface, the Ethernet interface, and the Audio Bitstream DAC.

The UC3A series focus on high performance, low power 32-bit microcontrollers and is aimed at several industrial and commercial applications including PLCs, instrumentation, phones, vending machines and more.[6]

2.7.1 Logical layout

Figure 2.10 shows the logical layout of the UC3A0512 microcontroller. There are four internal buses in the microcontroller, including the CPU local bus. Because of a bug in the chip, the CPU local bus was never used during this project, and it has been excluded from the figure.

The High Speed Bus Matrix (HSB) is the main bus of the chip. It is implemented as a many-to-many connector with a number of bus masters and slaves. Each bus master can read or write data to any of the slaves. Each slave is responsible for a subset of
Figure 2.10: UC3A0512 microcontroller (based on several figures in [8])
the address space. With one exception, all memory access has to go through the HSB. The exception is that there is a shortcut for the CPU core to the internal SRAM, which permits single-cycle access to the internal SRAM.

The following devices are connected to the HSB:

- Ethernet MAC (master): Reads and writes packets to memory.
- USB (master): Reads and writes packets to memory.
- USB (slave): Allows access to the packet buffers in the USB interface.
- EBI (slave): Allows access memory connected connected to the EBI bus.
- Flash (slave): Allows access to the internal flash memory.
- Peripheral DMA controller (master): Reads and writes data from/to peripheral devices.
- OCD (master): Allows debugger reads and writes to different peripherals and memory banks.
- CPU instruction (master): Reads CPU instructions.
- CPU data (master): Reads and writes data to different memory banks.
- Internal SRAM (slave): Allows access to the internal SRAM on the chip.

There are two peripheral buses: Peripheral Bus A (PBA) and Peripheral Bus B (PBB). The different peripherals are connected to the peripheral buses, which in turn are connected to the High Speed Bus (HSB) through bridges.

Note that the bridges act as slave devices on the HSB. The bridge itself is the sole master device on the peripheral bus. This means that it is impossible for devices only connected to the peripheral buses to access main memory.

The different devices expose data and configuration registers on the peripheral bus. These can be read and written by the CPU, or any other device able to act as a master on the HSB bus.

The Peripheral DMA Controller (PDC) is a device which can copy between data registers of different devices and main memory. This allows the CPU to offload the work required to receive and send data via the data registers of the devices. The different devices signal the PDC when they are able to receive or send more data. The PDC will then either copy data from main memory to the data register, or copy data from the data register to main memory. The PDC is connected to the HSB, and is able to access both main memory and the devices data registers through this bus.

### 2.7.2 Features

There are a number of features on the UC3A0512 microcontroller which may be of interest to us. In this section we will introduce them, and discuss why they may be of interest, and how we can use them.
Internal flash

The UC3A0512 microcontroller has 512 KB of internal flash (hence the “512” in its name). The microcontroller starts execution at address 0x80000000, which is the starting address of the internal flash. Therefore, the internal flash has to be programmed to be able to use this microcontroller. Different memories are mapped to separate areas in the physical address space available. Figure 2.11 shows the layout of the physical address space in the UC3A0512. Note that the memory areas may be missing or have different sizes in other UC3A microcontrollers (UC3Axxxx).

<table>
<thead>
<tr>
<th>UC3A memory layout</th>
</tr>
</thead>
<tbody>
<tr>
<td>0xFFFF 0000</td>
</tr>
<tr>
<td>0xFFE 0000</td>
</tr>
<tr>
<td>0xE000 0000</td>
</tr>
<tr>
<td>0xD000 0000</td>
</tr>
<tr>
<td>0xC000 0000</td>
</tr>
<tr>
<td>0xB000 0000</td>
</tr>
<tr>
<td>0xA000 0000</td>
</tr>
<tr>
<td>0x9000 0000</td>
</tr>
<tr>
<td>0x8000 0000</td>
</tr>
<tr>
<td>0x7000 0000</td>
</tr>
<tr>
<td>0x6000 0000</td>
</tr>
<tr>
<td>0x5000 0000</td>
</tr>
<tr>
<td>0x4000 0000</td>
</tr>
<tr>
<td>0x3000 0000</td>
</tr>
<tr>
<td>0x2000 0000</td>
</tr>
<tr>
<td>0x1000 0000</td>
</tr>
<tr>
<td>0x0000 0000</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Address</th>
<th>Memory Area</th>
<th>Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x80000000</td>
<td>Embedded Flash</td>
<td>512 kB</td>
</tr>
<tr>
<td>0x80000000</td>
<td>Embedded SRAM</td>
<td>64 kB</td>
</tr>
<tr>
<td>0xD0000000</td>
<td>EBI SRAM CS1/SDRAM</td>
<td>128 MB</td>
</tr>
<tr>
<td>0xC0000000</td>
<td>EBI SRAM CS2</td>
<td>16 MB</td>
</tr>
<tr>
<td>0xC0000000</td>
<td>EBI SRAM CS3</td>
<td>16 MB</td>
</tr>
<tr>
<td>0xD0000000</td>
<td>EBI SRAM CS1/SDRAM</td>
<td>128 MB</td>
</tr>
<tr>
<td>0xE0000000</td>
<td>USB Configuration</td>
<td>64 kB</td>
</tr>
<tr>
<td>0xFFE 0000</td>
<td>HSB-PB Bridge A</td>
<td>64 kB</td>
</tr>
<tr>
<td>0xFFFF 0000</td>
<td>HSB-PB Bridge B</td>
<td>64 kB</td>
</tr>
</tbody>
</table>

Figure 2.11: UC3A0512 physical memory map

Since the internal flash is a non-volatile internal memory of convenient size, it is a good option for an area to store the boot loader in. It could also potentially be used to store the Linux kernel, if we could get the kernel small enough to fit in the available space. The U-Boot boot loader (see section 2.12), uses the internal flash to save configuration options and user settings.

Flash is written in whole pages, and a page must be erased before a new can be written. On the UC3A0512, each flash page is 512 bytes. To write a flash page, data is added to a page buffer, and a write command is issued.

Internal SRAM

There are 64 KB of internal SRAM available on the UC3A0512. This is the only Random Access Memory (RAM) guaranteed to be available at start-up. Accesses to the internal SRAM take a single cycle to complete, and each access is word-sized (32 bits).
Since it is the only RAM guaranteed to be available at start-up, using it for the boot loader is a natural choice. The datasheet recommends to use it for the system stack, since it is the fastest memory available on the chip. Due to the multi-threaded nature of the Linux kernel, this may be difficult – if not impossible – to accomplish. Therefore we consider this to have a low priority.

External Bus Interface (EBI)

The EBI is a coordinator for a collection of Input/Output (IO) lines, and ensures successful data transfer between external devices and the microcontroller. The EBI has one Synchronous Dynamic Random Access Memory (SDRAM) controller and one SRAM controller muxed on a common output. These are together capable of handling several types of external memory and devices, such as SRAM, Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash, and SDRAM. The EBI is capable of simultaneously handling transfers with up to four external devices with Static Memory Controller (SMC) interface\(^1\). One of these four “channels” can be set to SDRAM mode. Each device is memory mapped in its own address space.

Technical data:

- 16 bit data bus
- 24 bit address bus
- Four chip select lines
- Several control pins

The data and address lines are shared between the different memory controllers, and some of the control lines are also shared.

External SDRAM

The SDRAM controller in the EBI supports 2 or 4 banks, with up 8192 rows and up to 2048 columns per bank. Each access can be for either 16 or 32 bits. The total amount of SDRAM is limited to 128 MB by the size of the memory segment reserved in the physical memory map of the microcontroller.

There is a bug in the SDRAM controller in all current revisions of the UC3A0512 (see 2.7.3). This bug makes running code from SDRAM unreliable. There is currently no workaround for this problem, and leaves SRAM and internal or external flash as the only memories from which these chips can execute code from. The properties and capabilities of SDRAM will therefore not be described in detail.

\(^1\)According to the datasheet it should be five. Based on the address memory map, we believe this is wrong.
External SRAM

External SRAM is interesting to us because of the SDRAM bug mentioned in section 2.7.2. Because the SDRAM controller is faulty, the only way to get enough memory to run U-Boot and Linux is to use RAM compatible with the SMC in the EBI. For this reason, Atmel provided us with a memory expansion card with SRAM and flash during previous work. This is why SRAM is used as the main memory throughout this project. The expansion card is described in section 2.17.4.

External SRAM can be connected to the UC3A0512 by using the SMC interface in the EBI. In the SRAM interface on the UC3A0512, the control lines are active low. See section 2.4 for a general introduction to SRAM. There are four chip-selects available, each of which allows for up to 16 MB of SRAM to be connected. The total possible amount of SRAM is therefore 64 MB.

On the UC3A0512, the SRAM timings are configured by specifying the waveforms for the different control signals. Read and write cycles have separate configurations, and can have entirely different timings, including the total length of the read or write cycle. For each read or write cycle there are five configuration values, and together they describe the total cycle. These are shown in figure 2.12.

![Figure 2.12: UC3A0512 SRAM timing configuration](image)

Network Media Access Controller (MAC)

Both the AP7000 and the UC3A0512 microcontrollers have on-chip Ethernet controllers called MACs. The specific MAC implementation in the UC3A0512 is by Atmel given the name MACB. In this report we will use the abbreviation MACB when referring to Atmel’s implementation. The MACB can be used in conjunction with an external chip
to provide Ethernet connectivity. The external chip that handles the physical layer of the Ethernet connection is called a PHY. Different PHY chips provide different physical layers, for example Ethernet over copper wires and Ethernet over fiber.

The PHY is connected to the Ethernet controller through either a Media Independent Interface (MII) or Reduced Media Independent Interface (RMII) bus. The MII bus requires 17 wires, while the RMII bus requires 10 wires. 4 wires are used in each direction for data transfers with a MII bus, while 2 wires are used with a RMII bus. To transmit 100 Mbit per second, a MII bus requires a clock rate of 25 MHz, while a RMII bus requires a 50 MHz clock.

In both the MII bus and the RMII bus, two lines are used for management of the PHY. These two lines can be shared between multiple PHYs, though this feature is not used by the AP7000 or UC3A0512 microcontrollers. To allow for sharing of the management bus, each PHY has its own address. There are 32 different addresses, allowing for up to 32 PHYs to share a management bus. Figure 2.13 shows the communication lines between the microcontroller and the PHY respectively.

Serial Peripheral Interface (SPI)

The microcontroller has several SPI interfaces. SPI is a full duplex synchronous serial data link for communicating with external peripherals or devices. SPI is a de facto standard that typically uses four wires[8]: one for each direction of data, one clock line and one chip selection line for every slave. Figure 2.14 shows how multiple devices on the EVK1100 are connected to the microcontroller. The EVK1100 will be introduced further in section 2.8. If all the slaves support “daisy chaining”, the slaves can be connected in a loop and share the chip selection line. Daisy chaining is not relevant for this project and will not be discussed any further.

SPI support and utilization is listed as a requirement 1 and 6e in section 1.3.1.
There is a GPIO controller on the UC3A0512 that controls the output of most of the pins on the chip. Different functions can be selected for each physical pin. In addition to the GPIO function of the pins, up to three other functions for different peripherals can be selected. The GPIO function of a pin is set by enabling the pin as a GPIO pin. The peripheral function can be set by disabling the GPIO function of the pin, and then selecting a peripheral function.

We are mainly interested in the peripheral function of the pins, and selecting which peripheral function each pin should employ. Typical peripheral functions we are interested in are the EBI bus and the serial port. There are also some LEDs, buttons and a joystick connected to the chip, which can be used through the GPIO controller. The LEDs may be useful, for example as indicators of the system state, or status lights during boot.

Universal Serial Bus (USB)

The UC3A0512 has a built in USB interface that is able act both as a device and a host (not simultaneously). It has a built-in PHY that takes care of the transmission on the physical medium, so the only external components required for USB to work are a few resistors, and optionally a connector. The USB unit on the chip has been extended with USB On-The-Go support since the AP7000 was made. USB On-The-Go is a relatively new standard that combines lower power requirements and a small form factor for connectors and cables with host capability and dynamic switching between
2.7. THE UC3A0512 MICROCONTROLLER

host and peripheral mode[24].

Support for USB under Linux would definitively be a useful feature. Linux already supports the internal USB interface in the AP7000[1], but the driver needs to be tested on the UC3A0512 microcontroller, and adapted if necessary. Because the UC3A series also have the On-The-Go capability, the driver may need significant extensions and changes in order to work.

2.7.3 Chip revisions

The microcontroller that will be used during this project is an engineering sample. The part number printed on the package of engineering samples of UC3 microcontrollers are suffixed ES, which makes the part number UC3A0512ES. There are several highly significant design errors in this particular revision of the chip. Design errors that may be relevant to this project are listed here. These can be found in the errata list in [8]. Note that the errata numbering is based on the numbers found in revision F (08/08) of the datasheet. The numbers may change between revisions of the datasheet, as more errata are added.

- SPI interface bugs (erratum 41.4.1)
- Two NOPs needed after instructions masking interrupts (erratum 41.4.5.5)
- Processor reports wrong processor ID (erratum 41.4.5.1)
- Bus error during debug mode causes processor to stop responding to debug commands (erratum 41.4.5.2)
- CPU cannot operate on a divided slow clock (erratum 41.4.5.12)
- Code execution from external SDRAM does not work (erratum 41.4.6.1)
- Memory Protection Unit is not functional (erratum 41.4.5.7)
- Peripheral Bus A maximum frequency is 33MHz instead of 66MHz (erratum 41.4.8.4)
- On some rare parts, the maximum HSB and CPU is 50MHz instead of 66MHz (erratum 41.4.8.6)
- Corrupted read in flash after FLASHC WP, EP, EA, WUP, EUP commands may happen (erratum 41.4.12.4)
- Stalled memory access instruction write-back fails if followed by a HW breakpoint (erratum 41.4.14.1)

2.7.4 AP7000 versus UC3A0512

Table 2.1 shows a rough comparison between the AP7000 and the UC3A0512. One very important difference is that the UC3A0512 does not have an MMU. It also has a shorter pipeline and does not have any cache.

A significant difference between the AP7000 and the UC3A0512 is that the UC3A0512 does not support unaligned memory reading or writing. In the example in figure 2.2,
<table>
<thead>
<tr>
<th>Feature</th>
<th>AP7000</th>
<th>UC3A0512</th>
</tr>
</thead>
<tbody>
<tr>
<td>Cache</td>
<td>YES</td>
<td>NO</td>
</tr>
<tr>
<td>Frequency</td>
<td>150MHz</td>
<td>66MHz</td>
</tr>
<tr>
<td>Pipeline</td>
<td>7-stage</td>
<td>3-stage</td>
</tr>
<tr>
<td>Dhrystone MIPS</td>
<td>210@150MHz</td>
<td>91@66MHz</td>
</tr>
<tr>
<td>Internal SRAM</td>
<td>32kB</td>
<td>64kB</td>
</tr>
<tr>
<td>Internal Flash</td>
<td>0kB</td>
<td>512kB</td>
</tr>
<tr>
<td>Unaligned memory access</td>
<td>YES</td>
<td>NO</td>
</tr>
<tr>
<td>Memory Management Unit</td>
<td>YES</td>
<td>NO</td>
</tr>
<tr>
<td>Memory Protection Unit</td>
<td>NO</td>
<td>YES</td>
</tr>
<tr>
<td>Ethernet MAC 10/100</td>
<td>YES</td>
<td>YES</td>
</tr>
<tr>
<td>Java Hardware Acceleration</td>
<td>YES</td>
<td>NO</td>
</tr>
<tr>
<td>Read-Modify-Write instructions</td>
<td>NO</td>
<td>YES</td>
</tr>
</tbody>
</table>

Table 2.1: AP7000 vs UC3A0512 comparison table[7, 8, 9]

the UC3A0512 is capable of per-halfword copying everything except for the first and the last byte. The UC3A0512 is capable of doing halfword copying like this whenever both input addresses are odd or even.

There are also several other similarities and differences in the available peripherals. Without going into much detail, we can mention the following:

- The power manager, which is responsible for clock generation to various peripherals, is mostly the same.
- The interrupt controller is the same.
- The Ethernet MAC controller is the same.
- The Parallel Input/Output controller on the AP7000 has been replaced with a General Purpose Input/Output controller
- Several peripherals are not included on the UC3A0512, such as the MultiMediaCard interface, the LCD controller, the PS/2 module.

2.8 EVK1100

The EVK1100 is an evaluation kit for the UC3A microcontroller series, and can be seen in figure 2.16. Like most Atmel products, the name is often prefixed with “AT” (AT-EVK1100), but the name EVK1100 will be used throughout this report. The EVK1100 and other development/evaluation kits mentioned in this report will often simply be referred to as “boards”. The EVK1100 has a UC3A0512 microcontroller and several devices, peripherals and connectors. The features of the EVK1100 that are most important in regards to this project are the clocks, the serial port and the Ethernet peripherals and connectors. Figure 2.15 shows a simplified block diagram of the organization of the
most central components of the EVK1100. The figure also illustrates how some devices and peripherals share the same bus lines.

The three buses labeled in this figure are introduced in section 2.7.2. The network controller is a DP83848I from National Semiconductor and is connected to the microcontroller via an RMII bus. This is the same PHY as on Atmel’s NGW100 network gateway kit, which is already supported by U-Boot and Linux. The RMII connection bus and the impacts of using it is explained in detail in section 2.7.2.

The JTAG connector is essential for programming and debugging. Power connectors and regulators are obviously also a necessity.

Located on the underside of the EVK1100 is a 32MB SDRAM chip. This memory was never used during this project due to a bug in the microcontroller (described in section 2.7.3). In newer revisions of the microcontroller, when this bug is corrected, the SDRAM will be very useful. The evaluation kit also has many other interesting and useful features including a Liquid Crystal Display (LCD) display, a Secure Digital (SD) card slot, LEDs, microswitches, a potentiometer, a joystick and a light sensor.
2.9 JTAG

JTAG is a hardware interface that provides a “back door” into a system for testing, analyzing behavior and debugging. Atmel’s own JTAG device, called JTAGICE MKII, was at our disposal. The JTAGICE MKII supports On-chip debugging of all AVR and AVR32 microcontrollers with IEEE 1149.1 compliant JTAG interface. It is connected to a computer using either a serial or USB cable and is supported by both Windows and Linux. Under Linux, a command line application named `avr32program` is used to program microcontrollers via the JTAGICE MKII, and `avr32gdbproxy` enables a proxy for debugging with GNU Debugger. For more about GNU Debugger (GDB), see section 2.13.7.

2.10 Binary formats

This section will introduce the file formats for executables and shared libraries we have looked at. The Linux kernel has support for the following binary formats:

- Executable and Linkable Format (ELF)
- Function Descriptor Position Independent Code (FDPIC) ELF
- Flat
- A.out
- SOM
We focused on the ELF format already supported by the AVR32B architecture, and its derivative, the FDPIC ELF format. The FDPIC ELF format is a variant of the ELF format, and is designed to run on MMU-less systems.

We also looked into the Flat format, which is a simple binary format for MMU-less systems.

2.10.1 Terminology

In this section we will introduce the terminology we use to describe binary formats. An overview of the terminology is shown in figure 2.17.

![Figure 2.17: Terminology for binary formats](image)

**Process**

A running instance of an executable. This is typically an application which is loaded into memory together with all shared libraries used by it.

**Executable**

An executable refers to a program file, which can be loaded into memory and executed.

**Shared library**

A shared library is a file with code and data which can be shared between several processes. A typical example of a shared library is the C library, which provides programs with the standard C functions, such as `printf`, `exit` and `sleep`. A program can link to several shared libraries, and each shared library can link to other shared libraries.

**Module**

A module refers to a single loadable file with code and data. It can either be a shared library or an executable.
Segment

This is a loadable part of a module. It refers to a block of code and/or data. This block is loaded at a specific address, and will have specific access rights. Typical examples are code-segments, which are readable and executable, and data segments, which are readable and writable.

On some architectures there are advantages to a more fine-grained separation, with separation into three segments: code, read-only data and read-write data. If the processor supports it, these segments can be given different permissions, so that data can never be executed. However, on systems without MMU or MPU, such separation does not increase security, since there is no memory protection.

2.10.2 ELF

The ELF[16] format was developed by UNIX System Laboratories, and in 1999 it became the standard format for Unix-like systems, possibly due to the 86open project group’s effort. The 86open project group was formed in 1997 to discuss the need for a standard binary executable for x86 based unix systems. This project group were dissolved when the vendors orginally forming the group had chosen the Linux ELF format.

The ELF format is described in the System V ABI specification[21]. ELF files are usually created by the assembler and linker during the build process. ELF files can provide either a link view (2.19(a)), an executable view (2.19(b)) or both. The link view is used when the linker combines several ELF files into one ELF file, while the executable view is used when loading an ELF file for execution. The executable view is used in both shared libraries (called shared objects in ELF), and executables.

There are three main types of ELF object files:

- **Executable file.** These are the program files that are loaded by the operating system.

- **Shared object file**, which are the ELF shared libraries. These files are usually loaded by the dynamic linker when a program is executed, but can also be linked into an executable file while compiling the executable.

- **Relocatable file.** These files are intermediary files used when compiling programs. It is typically the compiler which generates these files, and then the linker will combine several of them into the final executable.

Figure 2.19 is an example of an ELF file. This example is based on an statically linked version of BusyBox. It shows the relationship between sections and segments. In this example there is an additional segment defined in the program header which is not shown, namely the stack. The stack is given with a filesize set to zero, but a memsize set to the wanted stack size. This is used to tell the loader to allocate memory for the stack when the program is loaded.
2.10. BINARY FORMATS

Figure 2.18: ELF Views

ELF Header
Program header table
optional
Section 1
...
Section header table

(a) Link

ELF Header
Program header table
...
Section header table
optional

(b) Executable view

Figure 2.19: Object example (BusyBox)
Sections and segments

The sections are generated by the compilers, and form the individual parts of the relocatable file. Code, read-write data, read-only data, relocation info, debugging info and various other information is stored in individual sections. The linker will take equal sections from all of its input files, and combine into one larger section. For example, it will combine all the sections with code into one big section. Later, the sections with equal access permissions (i.e., read-only, executable, read-write) will be combined into segments for the executable view.

Program header

The program header is a list of the segments in the file. For each segment, it contains a description of the segment, which will be used to load the segment into memory. The elements in this description is listed below:

- **type** tells what kind of segment it is.
- **offset** is the start address within the file for the segment
- **vaddr** is the virtual address where the segment should be located within memory
- **paddr** is reserved for the segments physical address for systems where it is needed.
- **filesz** is the size used by the segment in the file (may be zero).
- **memsz** is the size used by the segment in memory (may be zero)
- **flags** contains permission (read, write, execute) for the segment
- **align** contains the alignment necessary for this segment.

The memory size of a section may be larger than its file size. The remaining bytes in the section will then be padded with zero-bytes. This is used by data segments where variables are initialized to zero, and thus provides a simple way of saving disk space.

Loading and execution

The Linux kernel will identify an ELF file based on the first four bytes of the file. These bytes are `{0x7f, 'E', 'L', 'F'}`. Once the file is identified as an ELF file, the kernel will find the program header, and iterate over the segments listed there. Each segment will be loaded according to the descriptions in the headers. When all the segments are loaded, the kernel will transfer control to the new program.
Dynamic linking

An ELF binary using dynamic linking has a special program header that indicates which dynamic linker should be used. The dynamic linker is a program that knows how to load shared libraries, and link the executable with them at runtime. The Linux kernel will load both the original program, and the dynamic linker. Instead of passing control to the original program, the dynamic linker will be executed first. The dynamic linker will then do the actual loading and linking of the shared libraries.

Shared libraries and programs which use dynamic linking contains a segment with information for the dynamic linker. This is known as the DYNAMIC section. The DYNAMIC section contains relocation information, information about shared functions, and information about libraries used.

The dynamic linker will use this information to locate the libraries the program should use. It will then load those libraries. Sometimes the dynamic linker is unable to load the libraries at exactly the address they have requested in the program headers. In those cases it will use the relocation information stored in the DYNAMIC section of those libraries to relocate the library to a different address.

The program needs to be able to access functions and data in the shared libraries. Information about what functions and data is used is stored in the DYNAMIC section. The dynamic linker will find the parts of the program that needs to be updated to access the functions and data, and insert the correct reference.

2.10.3 FDPIC ELF

The FDPIC ELF format is an adaption of the ELF format. Its purpose is to be able to execute ELF files on platforms without MMU support. Our main source of information about the FDPIC ELF format was [13].

Memory layout

FDPIC ELF files can be loaded into memory in two different ways. If the file has a constant-displacement flag set, all the segments in the file will be loaded into one contiguous block of memory. If the constant-displacement flag is unset, each segment will be loaded separately.
Figure 2.20 shows an example where three processes are running, and the constant-displacement flag is unset. Two of the processes are instances of Program A, and one process is an instance of Program B. All processes share a common library.

As shown in the figure, we have only one address space which is shared by all the processes. Only read-only segments can be shared between different processes. The code segments, which are read-only, are shared between the processes, while each process has its own copy of the data segments.

The challenge is that processes cannot make any assumptions about where each segment will be loaded. The typical situation is that each module has two segments – one segment for code and read-only data, and one segment for read-write data. The code which is running from the code segment needs to be able to locate its variables stored in the data segment. It also needs to be able to locate the address of functions in shared libraries.

The solution to this is to have a table in each module known as the “Global Offset Table”, or Global Offset Table (GOT) for short. This is a table with offsets to various functions and variables. The table is stored in the data segment, and will be updated with current addresses of functions and variables during the startup of the program. The offset of this table is stored in a dedicated register. Whenever the application needs to access its data segment, it will look up the address of the variable in the GOT.

Calls between different modules need special handling. Because each module has its own GOT, the register which contains the current address of the GOT needs to be updated with the address of the GOT from the new module. To accomplish this, the GOT address for the module containing the function is loaded into the register before the function is called. The old value is restored when the function returns.

In addition to the addresses in the GOT, there may be other addresses in the data segment which needs to be updated. Example:
This will create an array with addresses to three strings. The addresses will be invalid when the program is loaded, and will therefore need to be updated. To update these addresses, there is a \texttt{rofixup} list in the program file. This list contains the location of all addresses that needs to be updated. The \texttt{rofixup} list is stored in the code segment of the file, and can therefore be reached by using a relative reference once the program has been started.

**Stack**

In addition to the memory layout differences, there is another difference between normal ELF files and FDPIC ELF files. If the processor running the application has a MMU, the operating system can grow the stack dynamically as the program uses it. This is infeasible without an MMU, so the stack has to be allocated before the program is started, and it has to be big enough to fit the requirements of the program.

In a FDPIC ELF file, there must be a program header which indicates how big the stack must be. The operating system will then allocate a stack with the required size for the program. The program header with the stack has the type \texttt{PT\_GNU\_STACKSIZE}.

**Loading and execution**

When the Linux kernel detects a FDPIC ELF file, it will start by loading the program header. It will check whether the file has the constant-displacement flag set. If the flag is set, it will iterate over all the segments in the file, and determine how big a memory area is needed for all the segments. The memory area will be allocated, and then all segments will be loaded with the offset and size which is specified in the segment list.

If the constant-displacement flag is unset, each segment will be loaded independently of all others. Some of the segments may then be shared with other processes.

After the program is loaded, the kernel will transfer control to the program. To allow the program to relocate itself, a loadmap is included as a parameter to the program. The loadmap describes where the various segments are located in the memory.

```c
/* segment mappings for ELF FDPIC libraries/executables/interpreters */
struct elf32_fdpic_loadseg {
    Elf32_Addr addr;  /* core address to which mapped */
    Elf32_Addr p_vaddr;  /* VMA recorded in file */
    Elf32_Word p_memsz;  /* allocation size recorded in file */
};

struct elf32_fdpic_loadmap {
    Elf32_Half version;  /* version of these structures, just in case... */
    Elf32_Half nsegs;  /* number of segments */
    struct elf32_fdpic_loadseg segs[];
};
```

The program will first locate the \texttt{rofixup} list. This can be done by using relative addressing – the \texttt{rofixup} list is stored in the code segment, and will have a constant
displacement from the initialization code. The program will iterate over the `rofixup` list, and update all the locations listed in that list with new addresses. Once this is done, the program is ready to begin execution.

**Dynamic linking**

The dynamic linking of FDPIC ELF binaries is done in mostly the same way as the dynamic linking of normal ELF binaries. The Linux kernel will load the dynamic linker in addition to the normal program, and pass control to the dynamic linker. The dynamic linker will receive a reference to both its own loadmap and the loadmap for the program which is executed.

It will load shared libraries, relocate them as needed, do run-time linking, and pass control to the executed program.

### 2.10.4 Flat

The bFLT format is a simple flat binary format based on the a.out format, and is the de facto format for uClinux. This section is based on [15] and [20].

It was designed to simplify the application load and execute process, create a small and memory efficient file format, support MMU-less systems and storage of GOT. bFLT is either a fully relocatable binary or a PIC. With Position Independent Code (PIC), it is possible to use execute-in-place, and share the text segment between multiple instances. PIC need support for relative addressing in the architecture (this is present in AVR32).

Figure 2.21 shows a conceptual view of the organization of the file. The header contains information about the file format version, where each section of the file is located, and how big the stack should be. A flat binary has one (and only one) text section (code), data section and bss section (relocations).

Usually, Flat binaries are generated by adding an additional tool to the toolchain, by employing a special linker script. `elf2flt` is such an utility, and is used during the linking process. `coff2flt` is an other example of such a utility.

#### 2.10.5 Comparison of binary formats

<table>
<thead>
<tr>
<th>Feature</th>
<th>ELF</th>
<th>ELF FDPI</th>
<th>FLAT</th>
</tr>
</thead>
<tbody>
<tr>
<td>Support for MMU-less systems</td>
<td>No</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr>
<td>Support for shared libraries</td>
<td>Yes</td>
<td>Yes</td>
<td></td>
</tr>
<tr>
<td>Support for arbitrary number of segments</td>
<td>Yes</td>
<td>Yes</td>
<td>No</td>
</tr>
<tr>
<td>ELF Compatible</td>
<td>Yes</td>
<td>Yes</td>
<td>No</td>
</tr>
<tr>
<td>Need extra step during linking</td>
<td>No</td>
<td>No</td>
<td>Yes</td>
</tr>
</tbody>
</table>

### 2.11 Linux

Linux is an open source operating system initially written by Linus Torvalds with help from programmers around the world. It is a clone of the operating system Unix, and
Aims towards POSIX and SUS compliance.

According to Kernel.org, Linux is easily portable to most general-purpose 32- or 64-bit architectures as long as they have a paged MMU and a port of the GNU C compiler (gcc). Linux has also been ported to a number of architectures without a paged MMU, although functionality is then obviously somewhat limited[14].

In a white paper on Linux in the embedded market, researchers from the VDC Research Group state the following reasons for Linux’ growing popularity[25]:

- Licensing cost advantages
- Flexibility of source code access
- General familiarity
- Maturing ecosystem of applications and tools
- Growing developer experience with Linux as an embedded OS

Kernel.org claims that Linux has all the features you would expect in a modern fully-fledged Unix, including true multitasking, virtual memory, shared libraries, demand loading, shared copy-on-write executables, proper memory management, and multi-stack networking including IPv4 and IPv6.

Linux was originally made for 32-bit x86, but has later been ported to a wide range of architectures, including:

- Alpha AXP
- Sun SPARC
- Motorola 68000
- PowerPC
- ARM
- Hitachi SuperH
- IBM S/390
- MIPS
- HP PA-RISC
- Intel IA-64
- AMD x86-64
- AXIS CRIS
- Renesas M32R
- H8/300
- NEC V850
- Tensilica Xtensa
- Analog Devices Blackfin architectures
- Atmel AVR32 (AVR32b)

### 2.11.1 Configuration

The build process for Linux kernel can be configured through a framework named kbuild. This system consist of a top level makefile, one makefile for each architec-
ture, a set of kbuild Makefiles and a set of common rules for all kbuild makefiles (scripts/Makefile.*). Some documentation of this infrastructure can be found in the kernel documentation [17, kbuild/modules.txt and kbuild/kconfig-language.txt]

The configuration defines which subdirectories should be visited during the build process. Each of these subdirectories has a makefile for kbuild, and these use information from the (top level) file .config during the build process.

When started, the configuration utility uses information from the Kconfig file in the subdirectory for the currently selected architecture. The Kconfig file may also include other Kconfig files. The configuration utility presents to the user with available compile time options defined in the Kconfig files. Invoking 'make menuconfig' (or equivalent) will read these files and construct a file named .config, located in the root folder of the kernel source tree. The .config file is read when the kernel is built. There are also targets defined in the makefiles that sets all, none, random or certain groups of compilation options (allyesconfig, allnoconfig, etc).

2.11.2 Tasks

Internally to the Linux kernel, all threads of execution are known as “tasks”, and information about them are stored in a structure named task_struct. Each task contains references to the current virtual memory area of the task, the open files, the user the task is running as, and several other pieces of information. Much of that information can be shared with other tasks. For example, the virtual memory area of a task can be shared with other tasks.

By varying what information is shared between tasks, it is possible to accomplish different degrees of separation. Two threads in the same process will share almost everything in the task structure. Two separate processes will share much less, but they will still share some information. The information is still shared includes the current file system name-space and some other name-spaces.

It is also possible to create two tasks with no shared name-spaces. This can be used to create virtual servers, and is a field under active development in Linux.

Kernel stack

On Linux, each task has a kernel stack. The kernel stack is used as long as the task is executed in kernel mode. If the task also executes in user mode, it will have a separate stack for that part. As soon as the task enters the kernel, for example on a system call or on an interrupt, it will switch to using the kernel stack.

The first that is done upon entering kernel mode is always to save the user space registers. This means that the bottom of stack will always contain the user space registers, which makes it easy to retrieve the user space registers of a running thread.

The kernel stack is 8192 bytes large on the AVR32 architecture. Most of the stack is occupied by the stack itself, which grows from the top and downwards. The lowest part of the stack contains a structure named thread_info. This structure contains references
to the task this stack belongs to, and also some low-level information about the task. A simple overview of the kernel stack is shown in figure 2.22.

Storing the thread_info structure in the lowest part of the kernel stack makes it easy for the kernel to locate the currently executing task. It only needs to retrieve the current stack pointer and round it down to a 8192 byte boundary. This makes retrieving the current task a very low-cost operation.

There are two methods for accessing the information on the kernel stack. To retrieve the user space registers of a task, we have the task_pt_regs function. Given a task pointer, that function will locate the bottom of the kernel stack of that task, and retrieve the registers stored there. There is also the current_thread_info function, which retrieves the thread_info structure of the current task.

![Kernel stack diagram](image)

Figure 2.22: Kernel stack

### 2.11.3 uClinux

Originally, uClinux was a fork of the Linux 2.0 kernel, intended for microcontrollers without MMU support. However, the uClinux project has grown both in brand recognition and coverage of processor architectures, and the uClinux code has been integrated into the main line of Linux development since 2.5.46[22][18]. This is why no special uClinux kernel or patches are considered in this report, since uClinux is already integrated in the official releases from kernel.org. Note that the uClinux name is still used several places in the Linux kernel and the toolchain.
2.12  U-Boot

U-Boot is a boot loader for embedded systems. It is developed and maintained by Wolfgang Denk at DENX Software in Germany, and is mainly used to boot Linux. It also has support for several other operating systems, such as NetBSD and QNX. Several architectures are supported, including PPC, ARM, AVR32B, MIPS, x86, 68k, Nios, MicroBlaze. For each architecture, multiple boards with different CPUs can be supported. U-Boot is open source free software released under the GNU GPL.

U-Boot already supports the AVR32B architecture on Atmel’s STK1000 and NGW100 development/evaluation boards. Support for the UC3A was implemented during our previous project during the fall of 2008.

2.12.1 Contributions

To contribute to the development of U-Boot, the code changes should be divided into logical chunks called patches. Patches are submitted to the official mailing list and should conform with its rules. The rules and conventions for the mailing list and U-Boot patches can be found on the DENX Software website\textsuperscript{2}.

2.13  Toolchain

In this context, a toolchain is a set of software tools capable of creating and debugging executables for a specific platform. It normally includes tools for working with binaries for the target machine, compilers and the C library.

2.13.1 Terminology

In this section, we will introduce some terms used when describing the toolchain:

- **Assembler**: A program for turning a textual representation of machine code into binary code.
- **Object file**: A file with binary code meant to be combined with other files with binary code into a program or library.
- **Linker**: A program for combining several object files (including libraries) into a program or library.
- **Compiler**: A program for turning a high-level language, such as C, C++ or Java into lower level code, such as assembler input, or directly into binary code. The output of the compiler can be a finished program, or an object file that must be linked with other files to form the program or library.
- **Library**: A collection of binary code that can be reused by other programs.

\textsuperscript{2}http://www.denx.de/wiki/U-Boot/Patches
2.13. TOOLCHAIN

- **Shared library**: A library where the linking is done when the program is executed.
- **Static library**: A library that is linked into the program when the program is compiled.
- **C library**: A library implementing all the standard C-functions, such as `printf`, `malloc` and `atoi`.

2.13.2 Linux toolchain

A toolchain on Linux typically contains at least:

- **GNU Binutils** – handles linking of executables, and transformation of assembler files into machine code.
- **glibc** – the C library.
- **GCC** – the C compiler.

![Figure 2.23: Elements of a toolchain](image)

Figure 2.23: Elements of a toolchain

Figure 2.23 shows how various pieces of a toolchain interacts. The figure shows how a program is created from three source files and a statically linked library. Two of the source files are C-files (`file1.c` and `file2.c`), and one of the source files is an assembler-file (`file3.s`). A statically linked library (e.g. the C library) is also included.

The compiler transforms the C-code into assembler files, which in turn are transformed into machine-code by the assembler. This step is usually invisible to the user, as GCC automatically invokes the assembler. The linker takes the object-files with machine code and the static library, and combines them into a single program.
2.13.3 GCC

The GCC project is a collection of compilers for various languages, such as C, C++ and Fortran. It supports several target architectures, including x86, ARM, MIPS and many more.

The GCC mission states: “GCC development is a part of the GNU Project, aiming to improve the compiler used in the GNU system including the GNU/Linux variant. The GCC development effort uses an open development environment and supports many other platforms in order to foster a world-class optimizing compiler, to attract a larger team of developers, to ensure that GCC and the GNU system work on multiple architectures and diverse environments, and to more thoroughly test and extend the features of GCC.” The first official beta of GCC was released 1987 and new versions has since then been released on a regular basis.[11]

The official version of GCC from the Free Software Foundation (FSF) does not currently support the AVR32 architecture. However, Atmel is providing support through patches that can be applied to the official version. Both the patches and pre-compiled binaries can be downloaded from Atmel’s website. The patched version of GCC support the AVR32 architecture, including the UC3 series, but is unable to produce relocatable programs for Linux.[4]

Extending GCC

GCC consists of language-dependent frontends for handling various languages, optimizers, and machine-dependent backends. The machine-dependent backends handles the various architecture-dependent parts of the compilation process.

gcc/config.gcc contains a definition of all the targets GCC can be configured for. When building for AVR32 and uClinux, the following target definition will be used:

```
avr32*-*-uclinux*)
  tm_file="dbxelf.h elfos.h linux.h avr32/linux-elf.h avr32/uclinux-elf.h
  avr32/avr32.h"
  tmake_file="t-linux avr32/t-avr32 avr32/t-elf"
  extra_modes=avr32/avr32-modes.def
  gnu_ld=yes
```

This tells us what files GCC will use. The tm_file-line lists files that define the target machine. The files will be evaluated in left-to-right order, so files later in the line can override earlier files. Three AVR32-specific files are on that line – linux-elf.h, uclinux-elf.h and avr32.h. All of these are located in the directory gcc/config/avr32/. These files configure most of the information about the target – everything from how the linker and assembler should be invoked to how many bits the registers are on that target.

There are also some other files of interest in the gcc/config/avr32/-directory:

- avr32.opt: The file defining the command line arguments that can be passed to GCC.
• **crti.asm** and **crtn.asm**: Start and end of `_init` and `_fini` sections. The linker combines the sections in these files with sections in all other files to build two functions which should be called at program startup and program exit.

### 2.13.4 GNU Binutils

GNU Binutils is a collection of tools for working with binary files. We worked with version 2.18 of GNU Binutils since that version was the one Atmel’s patches were created for. Amongst the operations which can be done with GNU Binutils are:

- Building binary files from assembler files, with the `as` tool.
- Linking files with binary code together to form executable programs, with the `ld` tool.
- Examining binary files, with the `readelf` and `objdump` tools.
- Trimming unnecessary parts from a program, with the `strip` tool.

GNU Binutils contains an abstraction layer for working with various types of binary formats[12]. This abstraction layer is known as the Binary Format Descriptor (BFD) library. Since many different platforms and architectures use the ELF binary format, with some variations, a base library of ELF functions has been defined. This library defines a basic implementation of the ELF format, and exposes a set of hooks where the target architecture can insert its own code for architecture-specific code.

### 2.13.5 elf2flt

**elf2flt** is a utility used during the link process to transform a ELF file into the flat binary format. The Flat binary format is described in section 2.10.4. **elf2flt** is developed as a part of the uClinux project.

### 2.13.6 Libraries

A C library, often called libc or similar, is a collection of header files and library routines that implement common operations. GNU is providing a library named GNU C Library (abbreviated glibc), which is used in the GNU system and most GNU/Linux desktop distributions. uClibc is a C library for embedded Linux systems. Compared to glibc, uClibc is much smaller and support MMU-less CPUs. Nearly all applications supported by glibc also work perfectly with uClibc[23].

uClibc currently supports the AP7 family of microcontrollers, but may need some significant modifications to work on the UC3A family. Dynamic linking of uClibc on MMU-less systems is currently not supported.
2.13.7 GDB

GDB is a feature-rich open source debugger that supports a wide range of platforms and hardware. Atmel maintains its own version of GDB and as features from this branch are matured they are merged into the official version of GDB. Currently, the AVR32 version of GDB (avr32gdb) is currently not in the official releases of GDB, but can easily be obtained from Atmel’s official web page\(^3\) by downloading the AVR32 GNU Toolchain. GDB enables the user to control and analyze in detail the program execution and states of hardware registers and memory. Instruction data can be disassembled, and breakpoints can be added at specific instructions or at specific line numbers.

2.14 BusyBox

BusyBox is an open-source software application that provides light-weight versions of many common UNIX utilities, and is called “The Swiss Army Knife of Embedded Linux” by its maintainers. It is written with size-optimization and limited resources in mind, and compiles to a single small executable.[2] Because it is open-source and extremely modular, it is very customizable and suitable for embedded systems. BusyBox also aims to achieve fast execution, and minimize run-time memory usage. This makes BusyBox a suitable set of tools for Linux running on the platform concerned in this thesis.

BusyBox is equipped with a a simple menu configuration system, based on the configuration system in the Linux kernel. A screenshot of the main menu of can be seen in Figure 2.24. By altering the configuration options, the BusyBox can be customized to fit the needs of a wide range of projects. It can be adjusted to find a balance between functionality, file size and memory usage requirements.

BusyBox contains a wide range of utilities, categorized by the build configuration system as depicted in figure 2.24. Each “application” of BusyBox is called an applet and most of these aims to be a replacement for the utilities normally found in an GNU system. The applets contain the most important features of the applications they imitate, but generally have fewer options.

\(^3\)http://www.atmel.com/dyn/products/tools_card.asp?tool_id=4118
The list of applets is fairly long and is not listed here. They can be explored by using menuconfig. Here is short a short list of some of the applets contained in BusyBox: 

- ls
- cp
- cat
- grep
- find
- mkdir
- rm
- rmdir
- df
- du
- vi
- diff
- adduser
- passwd
- fsck
- mount
- less
- ifconfig
- free
- ps
- ash/hush/msh (shells)
- tar
- gunzip

BusyBox can be used as init, and thereby start necessary services and applications, e.g. a shell for the terminal and/or telnet server.

According to the official web page, BusyBox will build on any architecture supported by GCC, and is tested with both uClibc and glibc.

### 2.15 Server protocols

An embedded system can either contain all necessary software and configuration in the firmware, or it can rely on downloading parts from another system or server during start-up. This section introduces concepts and software often used to serve the software and configuration to such a system.
2.15.1 DHCP

A Dynamic Host Configuration Protocol (DHCP) server can be used to distribute configuration options to network devices. The DHCP server usually assigns an IP address to the device, and can also provide information about where the root file system and kernel can be located.

2.15.2 TFTP

Trivial File Transfer Protocol (TFTP) is a simple protocol for transmission of files over an Internet Protocol (IP) network. It uses the User Datagram Protocol (UDP) for IP, and this enables it to be very lightweight compared to protocols that use the Transmission Control Protocol (TCP) for IP.

U-Boot can use the response from the DHCP server to locate the TFTP server and download the kernel image from this TFTP server.

2.15.3 NFS

The Network File System (NFS) protocol is, as the name suggests, a protocol for accessing files over a network in the same manner as files are accessed locally. The file system that is mounted in the topmost directory of the file system hierarchy is commonly called the root file system, and in Unix-like systems like Linux it is denoted “/”. NFS can be used to set up a root file system for a disk-less system, e.g. an embedded system.

2.16 Open-source collaboration

This section gives an introduction to the typical tools and norms for collaboration in open-source projects. A text file named SubmittingPatches is included in the Linux kernel documentation[17]. This file describes the general guidelines and rules to follow when submitting patches for Linux.

2.16.1 Git

Git is a revision control system, and was used for maintaining the source code for all the software units changed during this project. Git was initially developed by Linus Torvalds for use with the Linux kernel. Git is a de-centralized version control system with strong focus on performance and many advantageous features for very large distributed development projects.

2.16.2 Merging with current versions

To make sure that any changes done to the source are compatible with the maintainer’s current version, development branches should regularly be merged with the branch they are based on. Another reason for merging is to avoid development based on obsolete structures or frameworks. Rebasing can also be used as a way to extract the current
changes and apply them to a newer version. This should ideally give the same result as merging, but with a different revision history structure.

### 2.16.3 Splitting up patches

The rules for submitting patches for the Linux kernel, and many other projects, states that the patches must before submission be split into logical units of change. For example, if you are going to submit both a bug fix and performance enhancements for a single driver, these should be separated into two separate patches.

### 2.16.4 Patch submission format

The patches should usually be sent as an email to the appropriate subsystem maintainer for review. This maintainer will, if the patch is approved, ask the main maintainer to pull this patch into the main branch.

It is important that other developers are able to comment and quote patches, and therefore all patches should be submitted inline in the mail. For submitting patches to the Linux kernel maintainers, the formatting rules listed below apply. Many other software project maintainers have adopted these same rules.

- No MIME
- No links
- No compression
- No attachments
- Max 40kB mails to the mailing list (for larger patches, an URL should be provided instead)

Git provides functionality for formatting and sending patches based on the Git revision history. By specifying the format-patch command (`git format-patch`), Git can be instructed to generate patch files from a given revision interval. Usually, a series of patches should be accompanied by a descriptive cover letter. The patches can be sent by invoking `git send-email` with the cover letter and patch files specified as parameters. Many other parameters can be specified (like sender, receiver, SMTP-server etc), but Git will ask if any obligatory parameters are missing.

### 2.16.5 Signing your work

Especially Linux, but also other open source projects use the sign-off procedure on patches that are being emailed around. The sign-off line is added at the end of the patch description, and is used to certify that you either wrote it or otherwise have the right to pass it on as a open-source patch. This tag indicates that the signer was involved in the development, or that he/she was in the patch’s delivery path.
There is also a less formal tag used, namely "Acked-by", which is used by developers who have reviewed the patch and indicated acceptance.

2.16.6 Upstream
To send a patch upstream is a term used when they are sent in direction of the original author or maintainer of the project. These could then be included in the next version if they are approved.

2.17 Previous work
In this section we will briefly present previous relevant work done by ourselves and others.

2.17.1 AP7 series
The AP7 series microcontrollers from Atmel are already supported by U-Boot, Linux, GCC, uClibc and GNU Binutils. Because the AP7 and UC3A microcontrollers are implementations of the same architecture, they have many similarities. This is of great significance, since much of the existing code can be reused on the UC3A with few or no changes.

2.17.2 Linux support for MMU-less systems
The uClinux project was started with the aim to run Linux on processors without MMU support, and most of this is now included in the main Linux kernel tree. This means that the main Linux kernel tree already contains the basic code for MMU-less systems, and we can base further development on this code. We have examined the Linux source code, and have found several architectures that support devices both with and without MMU. The ARM architecture and the MIPS architecture are examples of such architectures. The implementations for these architectures can be used as examples on how this can be implemented for the AVR32 architecture.

2.17.3 Implementations for other architectures
Linux and necessary toolchain components is ported to some architectures without MMU. Some are ported in the uClinux project and most of these use the flat binary format, but there are also some implementations which uses the FDPIC, which can seem to be a more modern format. This section mentions implementations done for other systems without MMU, first for the kernel and thereafter the toolchain.

Linux kernel
All work described in this thesis is based on the latest stable Linux kernel version obtained at the beginning of this project (2.6.28.1). This kernel version support the FDPIC format
in three architectures, namely FR-V, Blackfin and SuperH. Each of these architectures have at least one MMU-less variant, and the existing code for these may be useful as inspiration when implementing support for a new architecture. Other implementations for MMU-less processors use the Flat binary format.

Binutils

The GNU Binutils version we based our work on is capable of producing FDPIC binaries for FR-V and Blackfin. Some code for dealing with FDPIC is almost architecture independent, and can in some cases be copied to a new architecture with minor changes.

elf2flt

Architectures supported by the current versions of elf2flt include m68k/ColdFire, ARM, Sparc, NEC v850, MicroBlaze, h8300 and SuperH. If the Flat format is going to be used, the existing implementation for these architectures may be used as examples.

2.17.4 SRAM expansion board

On the EVK1100, a 32 MB SDRAM chip is connected to the microcontroller’s EBI. Due to the SDRAM bug (see 2.7.2), this memory cannot be used to run code. The end result is that Linux cannot be run on current versions of the EVK1100 evaluation kit without hardware modifications. Using the internal SRAM for Linux is infeasible, since only 64 KB of internal SRAM is available, and a running Linux kernel requires far more memory. To work around the SDRAM bug, an expansion board for the EVK1100 with SRAM and flash memory was developed by Atmel for our previous project in 2008.

The expansion board connected to the EVK1100 can be seen in figure 2.16. The board is just a circuit board with a connector for the EVK1100, footprints for two 2 MB SRAM chips and one 8 MB flash chip, and some resistors and decoupling capacitors. See appendix H for the expansion board schematics.

Note that several pull-ups and capacitors have been removed from the EVK1100 to avoid conflicts with the expansion board.
Chapter 3

Implementation

In this chapter, we will present our approach to porting Linux to the UC3A0512 microcontroller and EVK1100 evaluation kit. We will also show what changes we did to GCC and GNU Binutils to support Linux on this platform.

In the first section we present the organization of our work flow, by explaining the approach we used to implement our requirements. The next section list what we excepted to be nessecary in order to achieve our goals. Our development setup is presented in section 3.3. The next section, section 3.4, present the changes (mostly cleanups) done to U-Boot in this project.

Section 3.5 presents the decision made as to which binary format is going to be used. The Linux kernel modifications and toolchain adaption is presented in the next sections, section 3.6 and 3.7 respectively.

Some adaptations and workaround had to be done to the development board and initializing code. These are presented in the sections SRAM optimization (3.8) and SPI chip select (3.9).

Our assignment text indicates that a list of applicable Linux programs should be compiled. In section 3.10 our use of BusyBox, the swiss army knife of embedded Linux, is presented.

The last section, section 3.11 present how we acquired the necessary code, and how we distributed our changes.

3.1 Methodology

An iteration based approach was used during this project. This approach is inspired by the “Incremental Process model”[10].

Each of the iterations in the process consists of the steps listed below. The Figure 3.1 shows the steps involved in the process flow.

1. Identifying the next goal needed to fulfill our requirements.

In this context we use the word goal when we refer to the coarse grained steps required to reach the requirements we defined in section 1.3.1. A goal can be e.g. 
making the kernel boot, getting GCC to build a proper binary, etc.

2. Sketching preliminary milestones for what we think is the most reasonable way to reach the goal.

In this context we use the word milestone for the more fine grained steps required to reach a goal. A milestone can be e.g. adding the new linker target in GCC, updating the BFD, etc.

3. Run short iterations consisting of the steps listed below. If we during these iterations found that our milestone still were too coarse, we would refine them into smaller steps.

   (a) Define or refine the milestone. We usually had a general idea of what the milestone should be, and we jumped directly to the next step.

   (b) Identify the necessary steps to reach the milestone. More on this in section 3.1.2.

   (c) Implement what we identified as necessary.

   (d) Test whether we reached the milestone or not.

       • If the goal is reached we go to step 1, and enter a new iteration with the next goal.

       • If the milestone is reached, we clean up our code by removing debug output and try to format it to conform with coding guidelines. We then enter a new iteration for the next milestone.

       • If the milestone is not reached, we begin a new iteration for the same milestone.

We kept these iterations short by keeping our milestones fine grained.

Figure 3.1: Development process
3.1. METHODOLOGY

3.1.1 Setting goals and preliminary milestones

At the beginning of this project, we mostly had a superficial idea about what had to be done to the code. To identify the necessary changes in complete detail, we would have to analyze all the relevant code. With the time frame set for this project, this would be infeasible, due to the size and complexity of the kernel and toolchain. Instead, the parts of the code necessary to understand were gradually uncovered during implementation. Some figures to support this decision (these figures as counted with a simple script and used the 2.6.28.1 kernel as source):

- The core kernel code (counting the kernel, mm/, init/ and fs/ subdirectories), contains almost one million lines of code.

- The AVR32 architecture implementation of the kernel contained about 23000 lines of code when we started our development.

- The FR-V architecture often used as reference, had about 19000 lines of code.

3.1.2 Milestone identification and implementation

When a milestone was identified, attempts were made to identify the necessary changes. Techniques varied a lot between the milestones, but typically consisted of one or more of the following steps:

- Reading documentation.

- Analyzing code, including code for other architectures.

- Tracepoints in the code (ie. printf/kprintf). This enabled us to get a overview of the execution flow and study the internal state during execution.

- Single stepping with GDB was used when we did not get the whole picture from the tracepoints we used. This technique was vital when an error caused a stack corruption. Stack corruption makes it difficult to locate the problem, because the stack is useful for backtracking the execution.

- Analyzing binaries with readelf/objdump (this is relevant only to the toolchain adaption).

During this survey phase, we found that we had to add other goals and milestones. Often we saw that changes had to be done in an other part of the system. E.g. when we worked our way through the compilation process, we often realized that work had to be done in the Linux kernel’s executable file loader and vice versa.

Some of the milestone identification and implementation attempts revealed that the milestone we set was not the right way to go, and we jumped right to the refining the milestone or even defining a completely different goal without completing the iteration.
3.1.3 Review

One important element of open-source development collaboration is the public review process, where other developers can read, test and comment the submitted code. All patches should be reviewed thoroughly and approved before they are applied to the maintainer’s development branch. There were two main reasons for us to submit our work to the appropriate mailing lists. First of all, the lists can provide valuable feedback to our work. Secondly, we wanted to make the code available to anyone that could make use of it. Publishing our work was also indicated in both the original assignment and in our communication with Atmel. The received feedback is presented in section 4.5.

3.2 Expected changes

In this section we list the changes we identified early on as necessary to reach our goals.

3.2.1 U-Boot

U-Boot was already working when the development started this spring. We wanted to incorporate the feedback we received on the patches we had sent out during our previous project into a new version of U-Boot. Some changes could be valuable both for us during development and for others who can benefit from our work.

We also wanted to make another effort to improve the speed the SRAM worked on, since the memory speed severely limited our execution speed.

To implement support for SPI and SD card reader we would investigate the possibility of employing existing drivers.

3.2.2 Select binary format

A binary format has to be selected in order to be able to fulfill the other requirements. This should be done by investigating both FDPIC ELF and Flat, and selecting the most fit.

3.2.3 Linux

The Linux port was at the start time of this project incomplete, and several things had to be done here.

The configuration files have to be updated to support both the new CPU and the new board.

Some hardware drivers have to be verified and updated. In U-Boot the networking speed had to be limited to 10Mbps when the clock is slow. This change has to be done to the Linux kernel as well. The GPIO subsystem is quite different from the PIO system found in AP7, and this requires some changes. Some similarities exist, so parts of the code can be reused.
3.3. DEVELOPMENT SETUP

The executable loader has to be adapted to support the FDPIC ELF format. This include adding the platform independent loader to the configuration, and implement platform dependent helper functions for this loader.

Exception and interrupt handling needs to be updated. Most of these changes here is to revise the entry-avr32a.s, so that it suits this processor.

Some differences in the memory system have to be taken account for. The address space layout has several differences, and this has to be fixed. The memory copying routine for this processor that cannot do unaligned access could be optimized. We found that it could be faster to do halfword copying or similar, when that was possible, instead of going to byte copying in all other cases than the trivial aligned copying.

3.2.4 Toolchain

We must create a toolchain that is capable to produce binaries which can be executed on our platform. Since this platform doesn’t have an MMU, the executables must be relocatable. We have two choices when it comes to executable formats – FDPIC ELF or Flat, and must decide on one of these.

If we decide to add Flat support, we must modify the elf2flt tool. It might also be necessary to change some of the toolchain, so that it can generate relocatable ELF executables. These executables should then be processed by the elf2flt tool to produce Flat binaries.

If we decide to add FDPIC ELF support, we will have to change the linker. The linker must be able to generate valid FDPIC ELF executables, which requires the executables to contain relocation information. We must also change GCC to support the \texttt{-mfdpic} flag, and pass it to the linker.

3.2.5 User space

Some sort of user space programs are necessary for this project to be of any use. We must therefore compile some useful programs for the platform and verify that they work.

3.3 Development setup

In this section we will introduce the setup of hardware and software used for development during this project. The setup consists of a computer running Ubuntu Linux 8.04, the EVK1100 development board, and a JTAG debugger. These components are connected as shown in figure 3.2.
3.3.1 JTAG

A JTAG connection is used for uploading the U-Boot boot loader to the board, and for debugging. The JTAG connection enables us to single step in both the running kernel, and in programs started by the kernel. By adding breakpoint instructions, we are also able to halt the execution at specific places, and inspect the data currently in memory, and the state of the CPU.

A programming and control utility called `avr32program` was used to program the microcontroller. The following command was used to upload U-Boot to the internal flash:

```bash
avr32program -pjtagicemkii -part UC3A0512ES program -finternal@0x80000000 -cint -F bin -O 2147483648 -e -R -r u-boot.bin
```

The parameters and arguments specifies the following:

- `-pjtagicemkii`: What programmer is connected to development board.
- `-part UC3A0512ES`: The device to be programmed.
- `program`: The action `avr32program` should perform. In this case it is “program memory”.
- `-finternal@0x80000000`: Tells `avr32program` that the programming should be done to the internal flash memory located at offset `0x80000000`.
- `-cint`: Which clock source the CPU should use during programming. `int` selects the internal RC oscillator.
- `-F bin`: The input format. `bin` means a binary file.
- `-O 2147483648`: The offset that should be programmed. `2147483648` is the same as `80000000_{16}`, which is the start of the internal flash memory.
- `-e`: Erase the flash before programming.
- `-R`: Reset the chip after the programming is complete.
- `-r`: Start execution after the reset.
- `u-boot.bin`: Filename of the binary file to write to the flash.
3.3.2 Serial cable

A serial converter was used to access the console of U-Boot and Linux. It was a generic USB-to-serial converter, and a baud rate of 115200 baud/sec was used.

3.3.3 Networking setup

A DHCP server was used to distribute configuration options to the development board. The DHCP server assigns an IP address to the board, and also provides information about where the root file system and kernel can be located (line 8 and 9 in listing 3.1).

The TFTP server is installed on the development computer, and configured to respond to requests for files located in a designated folder. It was used to serve the boot image for the Linux kernel.

Listing 3.1 shows the configuration file used by our DHCP server. We used udhcpd\(^1\) as DHCP server.

<table>
<thead>
<tr>
<th>Option</th>
<th>Value</th>
<th>Default</th>
</tr>
</thead>
<tbody>
<tr>
<td>start</td>
<td>192.168.0.20</td>
<td>192.168.0.20</td>
</tr>
<tr>
<td>end</td>
<td>192.168.0.254</td>
<td>192.168.0.254</td>
</tr>
<tr>
<td>interface</td>
<td>eth0</td>
<td>eth0</td>
</tr>
<tr>
<td>maxleases</td>
<td>234</td>
<td>254</td>
</tr>
<tr>
<td>opt</td>
<td>dns</td>
<td>192.168.0.1</td>
</tr>
<tr>
<td>option</td>
<td>lease</td>
<td>864000</td>
</tr>
<tr>
<td></td>
<td></td>
<td>10 days of seconds</td>
</tr>
<tr>
<td>siaddr</td>
<td>192.168.0.2</td>
<td>0.0.0.0</td>
</tr>
<tr>
<td>boot_file</td>
<td>/srv/tftp/uImage</td>
<td>(none)</td>
</tr>
<tr>
<td>opt</td>
<td>rootpath</td>
<td>/tftpboot/evk1100</td>
</tr>
</tbody>
</table>

Listing 3.1: DHCP configuration

The three last options gives information to clients about the TFTP server and NFS server.

- **siaddr** is the ip address of the server which hosts the TFTP server and NFS server.
- **boot_file** is the location of the kernel image file.
- **opt rootpath** sets the NFS root directory.

In our setup, U-Boot uses the response from the DHCP server to locate the TFTP server. It downloads the kernel image from this TFTP server, and executes it. The Linux kernel also receives a DHCP response, and uses it to locate the NFS server. This server is then used for the root file system.

3.4 U-Boot

At the end of our previous project, an updated set of patches for U-Boot was submitted to both the official U-Boot mailing list and the avr32linux.org’s U-Boot mailing list. Some constructive criticism about these patches was posted on the mailing list, and we decided to clean up some of the things remarked. This section describes every change.

\(^1\)http://packages.ubuntu.com/hardy/udhcpd
done to U-boot during this project, grouped in logical subsections. The actual changes can be seen in appendix B. Note that appendix B only lists the changes to the patches, not the complete revised patch series. For the complete patch series, see the official U-Boot mailing list\(^2\) or the digital appendix of this report.

Also note that some changes to the U-Boot source code were done after the submission of the revised patch series. These changes are described in section 3.4.5 and 3.4.6, and listed in appendix C.

### 3.4.1 Network speed limiting

During our previous project, a modification was done in the MACB driver in U-Boot to limit the operating speed of the PHY. This patch was replaced to reduce the changes in the MACB driver, and only limits the network speed if it is explicitly defined by setting a board configuration flag named `CONFIG_MACB_FORCE10M`. The original version of the patch can be found on the U-Boot mailing list archive\(^3\), and the modifications of it are listed in B.1. This new version of the patch received some criticism, and triggered some debate with suggested solutions on the mailing U-Boot mailing list, but no follow-up solution was implemented by us.

### 3.4.2 Adding the EVK1100 board to lists

We added the EVK1100 board to the files `MAKEALL` and `MAINTAINERS`. The `MAKEALL` file lists all boards for the AVR32 architecture supported by U-Boot, and the `MAINTAINERS` file lists the people that maintain different parts of U-Boot.

### 3.4.3 Precedence safety fix

When preprocessor macros are used to define simple mathematical expressions, the resulting expression substituted by the preprocessor may become a part of a larger expression. In some cases, if the macro is used without care, the resulting expression may produce the wrong result. To make sure that this never happens, we introduced some parentheses around the mathematical expressions. This change is shown in appendix B.3. Listing 3.2 shows an example of how careless use of the previous version of the macro can be used to produce the wrong result.

<table>
<thead>
<tr>
<th>Old macro version:</th>
<th>Evaluates to:</th>
<th>Result:</th>
</tr>
</thead>
<tbody>
<tr>
<td>SMC_CYCLE(42)*2</td>
<td>0x0008*(42)<em>0x10</em>2</td>
<td>1352</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>New macro version:</th>
<th>Evaluates to:</th>
<th>Result:</th>
</tr>
</thead>
<tbody>
<tr>
<td>SMC_CYCLE(42)*2</td>
<td>(0x0008*(42)*0x10)*2</td>
<td>1360</td>
</tr>
</tbody>
</table>

Listing 3.2: Macro precedence error example

---

\(^2\) [http://lists.denx.de/pipermail/u-boot/](http://lists.denx.de/pipermail/u-boot/)

\(^3\) [http://lists.denx.de/pipermail/u-boot/2008-October/041568.html](http://lists.denx.de/pipermail/u-boot/2008-October/041568.html)
3.4.4 Esthetical and other minor changes

The previously submitted U-Boot patches were updated to make the code conform with the coding style specified by the maintainer. The changes listed in appendix B.4 to B.7 are merely esthetical changes, removal of unused variables, and correction of comments. The only exceptions are the introduction of the network speed limiting flag, and the baud rate adjustment in the board configuration.

3.4.5 Auto detection of PHY address

The last patches submitted in the fall of 2008 included a routine for auto detecting the address of the external PHY. This routine would be invoked if and only if U-Boot was compiled with the PHY address set to 0xff. This was changed so that a flag in the board configuration file determines whether or not the routine will be compiled and user. The board configuration file is a more appropriate place for this option, since the PHY address is determined by the board. A flag also enables the possibility to make the preprocessor remove the auto detect routine. This in turn, results in a slightly smaller binary output file. As can be seen in appendix C, the files changed to achieve this were atevk1100.c, atevk1100.h and macb.c.

3.4.6 Removal of bug workaround

In an early stage of development, a structure describing the layout of the GPIO registers were shared between the implementations for UC3 and AP7 families. The layout of the GPIO registers are not the same on these two architectures, and the structure defined in software was incompatible with the UC3A. When writing to the GPIO registers defined by the incompatible structure, the wrong memory locations were accessed. This error surfaced by causing an interrupt to occur when initializing the USART, and before the bug was found, a workaround was implemented. The bug was eventually removed, but the workaround remained. The removal of this obsolete workaround can be seen in line 65 in appendix C.

3.5 Binary format selection

The current Linux support for the AP7000 microcontroller is based on the ELF binary format for programs and shared libraries. This format requires that the architecture has an MMU, and is therefore unsuitable for the UC3A0512 microcontroller. We had to select another binary format suitable for MMU-less architectures, and implement support for this format in the toolchain. The Linux kernel currently has support for two such formats, and we found it most practical to choose one of them.

The original assignment text ask for Flat binary support in the toolchain, but on the web page given with the assignment FDPIC ELF is proposed. In discussions with our supervisor at Atmel, FDPIC ELF was suggested as an equal, if not better alternative.
In section 2.10 both FDPIC ELF and the Flat binary format is presented. During development we tried both formats, and ended up using the FDPIC ELF format. Even though the Flat file format is simpler and more widespread the FDPIC ELF was chosen due to several advantages with this format:

- Simpler toolchain usage (does not need additional programs).
- More compatible with existing toolchain (objdump, readelf, gdb, etc).
- ELF support for AVR32 is already implemented.
- Closer to the standard format used in Linux.
- Flat is limited to four shared libraries in total in a program.

3.6 Linux kernel

This section describes the changes we did to the Linux kernel during this project. Processor-specific folders, files and code in the Linux source tree had already been added and modified during our previous project. The changes done to the Linux source tree during both projects have now been cleaned up and grouped into logical patches. These patches are listed in appendix D, and also summarized in section 3.11.7.

The first section is about the rebasing done in the start of this project and the second section regards the added and modified configuration files. Section 3.6.3 describes changes done to support the UC3 core, and section 3.6.4 is about changes done because the microcontroller used does not have a cache. In section 3.6.5, we describe some changes we had to do to the clock setup. Section 3.6.6 discusses the changes done to the driver for the network adapter. The next section, section 3.6.7, describes changes done to get the GPIO system to work. Section 3.6.8 mentions the configuration of the LED driver. The attempt to support SPI with Direct Memory Access (DMA) is discussed in section 3.6.9. A workaround for a bug in the CPU is described in section 3.6.10.

The next four sections presents changes that had to be done regarding memory access, both due to the lack of support of unaligned memory access and the lack of MMU.

Section 3.6.15 regard changes nessecary for the differences in exception and interrupt handling in the processors. Section 3.6.16 discusses modifications done to support FDPIC ELF binaries. Section 3.6.17 and section 3.6.18 regard refactoring done to make other changes easier. The last section gives an overview of all the patches we created.

3.6.1 Rebasing

The 2.6.27-rc6 version we had started out with during the fall of 2008, was getting quite old. We therefore started development with rebasing our changes on version 2.6.28.1 of kernel. When rebasing, we take all of our changes, and apply them to a newer version. This was done for the same reasons as given for merging in section 2.16.2. We selected version 2.6.28.1 because it was the most recent stable version of the kernel, and a kernel
with few defects were desirable. A release candidate for version 2.6.29 were available, but this kernel was more likely to contain defects.

### 3.6.2 Configuration files and make files

During our previous project, some changes were made to the Kconfig and Makefile to include the EVK1100 development board and the UC3A0512 microcontroller. To enable the compilation the UC3A0512 via the configuration system, the EVK1100 was added as a selectable board in Kconfig and Makefile in the avr32-folder (see patch 29 in appendix D.29).

#### Board support

During our project in 2008, we had copied the atngw100 folder in arch/avr32/boards to a new folder named atevk1100. The file setup.c in this folder had to be rewritten to match the hardware on the EVK1100. These changes include setting up the oscillators, SPI configuration, LED configuration, and running initialization functions for applicable hardware. The clock configuration is discussed in section 3.6.5.

The patch adding board support is listed in appendix D.29. A part of this patch adding a file with a generated default configuration (defconfig) for the board, is omitted due to its length, but is included in the digital appendix. This file contains the kernel configuration used when the kernel was compiled, and can be generated by invoking make menuconfig and configuring for this system.

### 3.6.3 UC3A support

In our project, during the fall of 2008, we began the process of taking the code for the AP7 microcontroller family and adapting it for the UC3A family. The arch/avr32/mach-at32ap folder was copied to arch/avr32/mach-at32uc3a, and changes were made to the code. These changes were sufficient to almost complete the boot process, but still some necessary changes remained.

The patch that adds support for UC3A devices can be seen in appendix D.28.

The file at32uc3a0xxx.c defines on-chip devices in the microcontroller, and the memory locations and layouts of these. Most of these addresses had to be updated, since the memory layout of AP7 microcontrollers greatly differ from the UC3A series. Many features in the AP7 are not present in the UC3A series. Support for the following features had been removed from at32uc3a0xxx.c (copied from at32ap700x.c) during the previous project:

- MultiMedia Card Interface (MCI)
- IDE/CompactFlash interface
- NAND Flash/SmartMedia
- AC97 Controller
3.6.4 Cache

Since the processor used in this project does not have the same caching facilities as AP7000, some function calls used in the that implementation had to be removed. A flag was added to make this conditional on whether the chip has cache or not. This was done by adding these functions as empty stubs in an architecture specific file. Moving these functions to a header an setting them to be inline would be more efficient, because then the compiler would be able to optimize them away. This was not done because optimization was not highly prioritized in this phase. Our changes can be seen in appendix D.12.

3.6.5 Clocks

There were two tasks that needed to be done for the clock setup. The first one was relatively simple, and was to configure what clocks were available on the EVK1100 board. This was done in `arch/avr32/boards/atevk1100/setup.c`, where we updated an array named `at32_board_osc_rates`. Up to three external clocks can be connected to the UC3A, so this array has three elements. One for the 32 kHz slow clock, one for `osc0`, and one for `osc1`. On the EVK1100, a 12MHz clock is connected to `osc0`, and `osc1` is not used.

The array thus became:

```c
unsigned long at32_board_osc_rates[3] = {
    [0] = 32768, /* 32.768 kHz on RTC osc */
    [1] = 12000000, /* 12 MHz on osc0 */
    [2] = 0,
};
```

The second task that needed to be done for clocks, was to update all the clock connections for the microcontroller. The AP7000 and the UC3A0512 share many of the same internal devices, but they are connected to different clock outputs. We therefore had to revise all the device definitions, and update the clock connections. For example, on the AP7000, the SDRAM controller is connected to clock output 14 (i.e. clock mask bit 14) on the peripheral bus B. On the UC3A0512, it is connected to output 5 on the same bus.

At runtime, the Linux kernel uses a list to keep track of which clocks are in use, and this list is used to assemble clock masks. The clock masks are used to disable clocks for inactive devices.

We also had to add clocks for devices that are present in the UC3A, but not in the AP7000. The clock for on-chip debug system is an example of such a clock that we had to add. Before the clock was added to the list, the debug system was turned off during startup. This prevented us from accessing the device over JTAG.
3.6. LINUX KERNEL

3.6.6 Limiting network device speed

The MACB driver in both U-Boot and Linux was compatible with the MACB in the UC3A0512 microcontroller. However, because of the combination of low clock speed and RMII-mode we had to force the driver initialize the macb to 10Mbit/s mode. This had been done to the MACB driver in U-Boot in previous work, and we had to make an equivalent and proper solution for the driver in Linux. The final solution can be seen in appendix D.1. This patch adds a few lines of code that checks whether the mode is set to RMII, and disables support for 100Mbit/s if the CPU speed is not high enough for this mode.

3.6.7 GPIO

The GPIO controller on the UC3A0512 microcontroller is different from the PIO controller found on the AP7000. Therefore, the PIO controller code had to be modified to work on the UC3A0512. We started by copying the file PIO controller files, and renaming all functions and variables from pio to gpio. We then updated the header file (mach/at32uc3a/gpio.h). This header file contains the register definitions for the GPIO controller.

We then went through the code in this file, and updated it to access the correct registers. Mostly the registers were present, but with a different name. For example, to enable pull ups, we had to set the PUERS register instead of setting the PUER register.

Some decisions were more difficult. For example, the AP7000 has support for something called multi-drive capability. When examining the schematics for a output pin in the data sheets for the AP7000 and UC3A, it was not immediately apparent that this did the same as the UC3A’s open drain mode. In the end we concluded that it did the same.

3.6.8 LED device driver

The EVK1100 has 8 LEDs that can be controlled independently (four single and two double). The NGW100 board has 3 LEDs, and we could simply re-use and modify the definition of these in the code. Linux uses a generic driver to control the LEDs, and this driver utilizes the generic GPIO interface. Note that in the final code, LED3 is not enabled because it is connected to the EBI bus (see 3.8.3). Lines 104-117 in patch 29 (appendix D.29) adds the necessary setup configuration for the LEDs in setup.c. The LEDs can be controlled in Linux by writing to trigger files in folders that appear in /sys/class/leds/, e.g. the command echo ‘‘heartbeat’’ > /sys/class/leds/led1/trigger enables a heartbeat on LED1.

3.6.9 SPI with DMA support

In Linux, the most suitable and proper way to communicate with the SPI on AVR32 devices is to set up and use a DMA controller. Both the AP7 and UC3A series have PDC controllers that provide hardware support for DMA functionality. Peripheral DMA
Controller (PDC) is abbreviated as PDC in the AP7000 datasheet, and PDCA in the UC3A datasheet. We will use the same convention here to distinguish between the two.

In an attempt to enable the SPI bus to communicate with the LCD display and DataFlash, the Linux source was searched for existing compatible or similar code for this. Support for the Peripheral DMA Controller (PDC) in the AP7000 series microcontroller was found in the Linux source code, but it was incompatible with the PDCA implemented in the UC3A series. While the PDC configuration registers are located in a reserved memory area of each IO device, the Peripheral DMA Controller (PDCA) has one central memory area for its configuration registers.

Because of these structural differences, we decided to write a generic interface to abstract the difference. The development of this interface was aborted when we were informed by Atmel that the existing PDC code had been restructured. The patch that changes the SPI driver and introduces the abstraction layer can be seen in appendix E.

3.6.10 Interrupt bug workaround

Because of a bug in the CPU, any instruction masking interrupts through the system register must be followed by two No-Operation (NOP) instructions to avoid abnormal behavior (see [8] section 41.4.5.5). This workaround had already been implemented in U-Boot, but also needed to be introduced in the Linux kernel. A separate patch was made for this specific workaround, and can be seen in appendix D.27. As can be seen in the patch, two NOP instructions are also added in the `mask_exceptions` macro. This may be superfluous since the bug should only affect masking of interrupts, not exceptions. The performance penalty of two NOP instructions is very low, so we chose to include them just in case.

3.6.11 Memory to memory copying

The Linux kernel includes architecture specific implementation of memory-to-memory copying routines. The existing implementation for the AP7000 had to be modified because the UC3 is not capable of doing unaligned memory accesses. These routines are as usually optimized in assembly because they are used very often.

The patch in appendix D.16 add a memory copy routing which is based on the version found in the AP7000 implementation. The changes were simple changes, with no attempts at optimization.

3.6.12 Memory copying with checksumming

In conjunction with TCP networking, when copying data from one place in memory to another, it is desirable to also checksum the data. It is most efficient to implement routines that perform these two operations at the same time. That way, one does not have to read the same data several times.

The `csum_partial_copy_generic` function implements this for the AP7000. Unfortunately, this function assumes that the architecture can do unaligned accesses, which
makes the code incompatible with the UC3A. The patch that fixes this incompatibility is listed in appendix D.14. It changes the code that calls \texttt{csum\_partial\_copy\_generic} to check that the buffers are aligned first. If the buffers are unaligned, it will first copy the data, and then checksum them.

The patch also updates a function named \texttt{csum\_partial}. This function does the same checksumming, but without copying the data. We updated this function to handle unaligned accesses.

### 3.6.13 User space memory access

The Linux kernel will often need to read or write memory belonging to a user space program, usually in response to a system call. There are a number of functions for performing these operations:

- \texttt{access\_ok}: Check whether a range of memory is valid user space memory.
- \texttt{clear\_user}: Fill a block of memory with zeros.
- \texttt{copy\_from\_user}: Copy a block of data from user space to kernel space.
- \texttt{copy\_to\_user}: Copy a block of data from kernel space to user space.
- \texttt{strncpy\_from\_user}: Copy a string from user space.
- \texttt{strnlen\_user}: Get length of a user space string.
- \texttt{get\_user}: Read an integer from user space.
- \texttt{put\_user}: Write an integer to user space.

These functions provide a generic interface to the architecture-specific methods for accessing memory. They are also responsible for preventing user space processes from reading or writing data they shouldn’t have access to. This is done by making sure that the memory areas passed to the functions belong in the user space part of the memory.

Some of the functions also have versions with less checking. Those functions are named with a \_\_ prefix, e.g. \_\_\texttt{get\_user}. \texttt{access\_ok} must be used to validate the block of memory before using the functions with less checking. Failure to do so may result in security vulnerabilities, where a program may access memory it isn’t allowed to access.

The existing functions for accessing user space memory utilizes the built-in MMU in AP7 processors to handle access violations to memory. This is, for example, used to handle the case where a read-only segment of memory is passed as the destination of \texttt{copy\_to\_user}. To implement this, the function marks every address where an exception may occur with an operation which be done if an exception occurs at that point.
Support for unaligned accesses

The existing `copy_to_user` and `copy_from_user` in the kernel were originally written for the AP7000 microcontroller, and assumes that unaligned accesses can be performed. We needed to change these functions so that they would work without performing any unaligned memory accesses. Without a functional MPU, memory protection is a lost cause, so we could simply use the normal memory copy functions for the implementations. There are however some advantages of implementing these functions with error checking. Error checking enables us to catch errors when user-space programs pass invalid pointers to the kernel. Also, if Atmel creates an microcontroller that features an MMU, but doesn’t allow unaligned access, our implementation should be reusable. It might also be possible to use this code with later revisions of the microcontroller where the MPU is functional.

The final implementation can be seen in appendix D.13 (patch 13). This patch introduces a new file, `copy_user-nounaligned.S`, to the `arch/avr32/lib` folder. This new file is a copy of the existing `copy_user.S`, modified so that the alignment of the input addresses are checked. If both addresses are aligned, the CPU can perform per-word copying. If not, simple per-byte copying is performed.

This could be optimized further, but we have not prioritized optimization. How this could be improved is discussed in section 6.2.5.

User space address ranges

On the AP7 implementation, all user space memory is located in the lower half of the virtual memory address space, while all kernel memory is located in the upper half. Functions which access user space memory validate that the memory they are accessing are located in the lower half of the address space.

On the the UC3A implementation, there is no separation between address spaces because it lacks MMU, and the kernel memory may be mixed with the user space memory. There is no fast way to determine whether a block belongs to user space or to the kernel. Without any memory protection, doing the check does not bring any extra security either. We therefore decided to disable these checks for in our implementation.

There were three places we decided that we needed to update:

`ret_if_privileged` is an assembler macro that is called by several other assembler functions, such as `copy_from_user`. It checks the memory area defined by the input parameters, and determines whether it overlaps with the kernel’s memory. This check does not work without the layout of the virtual address space employed by an MMU, and we chose to simply disable this check at compile-time. Patch 19 listed in appendix D.19 shows how this was done.

`access_ok` does the same check as `ret_if_privileged` by invoking the `__range_ok`, but is accessible from C code instead of assembler. We decided to replace the `__range_ok`
3.6. LINUX KERNEL

macro at compile-time by a dummy macro. This change is shown in Linux patch number 23 in appendix D.23.

strnlen_user checks the length of a string residing in user space. It contained some checks for the string length, to make sure it did not extend into kernel memory. The check in this function and its helper function adjust_length was removed for systems without an MMU. This change is listed in appendix D.17.

3.6.14 Address space layout

There are some major differences in the address space layout of the AP7000 and the UC3A microcontrollers. Most of the differences are due to the AP7000 having an MMU while the UC3A has an MPU. There are also some differences in the physical layout of the memory.

Physical layout

One of the differences is the physical layout of different memory blocks. There are several separate blocks of memory addresses designated for accessing different devices, and embedded or external memories. The base addresses of these memory blocks differ between UC3A and AP7 microcontroller. Some blocks of memory are only available on either the AP7 or the UC3A. For example, the embedded flash memory in the UC3A is not present in the AP7 microcontroller.

Virtual memory layout

On the AVR32 processors with an MMU the virtual memory area is split into five segments:

- P0/U0: 2GB of memory with caching and paging.
- P1: 512 MB of memory with caching but without paging.
- P2: 512 MB of memory without caching and paging.
- P3: 512 MB of memory with caching and paging.
- P4: 512 MB of memory mapped to device registers and memory. No paging or caching.

Only the P0/U0 segment is accessible in unprivileged mode. The various Px segments are used for various low-level code. Both the P1 and P2 segments map the same physical memory.

The Linux kernel was loaded into the P1 segment. To change the caching property when accessing memory, various places in the kernel convert an address from the P1 segment to the P2 segment. The P3 segment is used when the kernel needs to use page
translated memory for some purpose, for example when it needs to map device memory with specific caching properties.

When we updated the Linux source code, we had to update all the code which assumed that the microcontroller used this segmented memory model.

Null pointer debugging

Because the internal SRAM is located on address 0, this is a perfectly valid address. In processors with an MMU, reading or writing to address 0 is usually caused by an error, and causes an exception. In our case, the CPU can read and write to all the SRAM memory and even execute code from it. Whenever a software error caused data to be read from address 0, whatever data residing on this address in SRAM would be fetched. When software errors cause a jump to an addresses in the SRAM, the CPU will interpret whatever data on that location as instructions and attempt to execute them. This may further instruct the CPU to perform any operations. In our case, the contents of the SRAM would be any data left behind by the execution of U-Boot.

To ensure that the CPU halts when it tries to execute instructions from the SRAM, a short routine was temporarily introduced in the `sram_init` function in `arch/avr32/mach-at32uc3a/at32uc3a0xxx.c`. This routine writes the breakpoint instruction to every address in the SRAM, enabling us to detect the error earlier, with any potential backtrace information guaranteed intact. The routine is shown in listing 3.3.

```
1 unsigned long i;
2 unsigned short *p;
3 p = 0;
4 for (i = 0; i < 64*1024; i += 2, p++) {
5   *p = 0xd673;
6 }
```

Listing 3.3: SRAM debug routine

3.6.15 Event handling entry points

This section describes the changes made to the code that handles interrupts, exceptions and system calls. A file for AVR32B (`arch/avr32/kernel/entry-avr32b.S`) was included in the Linux kernel, and we used this file as the basis for our code.

A significant part of the work was to get a clear understanding of all that happened in the assembler file. The file had a large number of labels without descriptive names, and many parts of the code needed commenting. We examined the file, analyzed the code flow, added a few comments, and changed many labels to more descriptive names.

Most of the actual code changes were due to the difference in the way events are handled on the AVR32 sub-architectures. The AVR32B sub-architecture will store return addresses and the status register in dedicated system registers, while the AVR32A sub-architecture saves them to the stack. We tried to optimize the stack layout based on this.
3.6. LINUX KERNEL

Figure 3.3: Example of entry point changes

Figure 3.3 shows the typical set of changes for an event handler. We can see that most of the `mfsr` (move from system register) and `mtsr` (move to system register) commands are gone. These were used to retrieve and set the return address and status register, which is unnecessary on the AVR32A architecture since they are already located on the stack. The stack layout changes can also be seen in the figure. The order of saves and restores from the stack is changed, and we no longer save the program counter and status register at all.

### Stack layout changes

The kernel expects to be able to access the register data from when the exception, interrupt or system call occurred.

These registers should be saved in a structure named `pt_regs`. Therefore, the first that is done in the entry points is to save all registers to the stack in an order that matches the `pt_regs` structure.

When handling an exception or an interrupt, the AVR32B sub-architecture uses dedicated system registers to save the program counter and status register. These are automatically restored on exit. The AVR32B code will assemble the `pt_regs` structure from the current registers, and the program counter and status register from the system registers.

<table>
<thead>
<tr>
<th>Original code</th>
<th>Our code</th>
</tr>
</thead>
<tbody>
<tr>
<td>do_mai_l1:</td>
<td>do_mai_l1:</td>
</tr>
<tr>
<td>sub sp, 4</td>
<td>sub sp, r0-1r</td>
</tr>
<tr>
<td>stata --sp, r0-1r</td>
<td>/* skip r12_orig */</td>
</tr>
<tr>
<td>mfsr r3, SYSREG_RSR_M1</td>
<td></td>
</tr>
<tr>
<td>mfsr r8, SYSREG_BAR_M1</td>
<td>/* Check for kernel-mode */</td>
</tr>
<tr>
<td>bfatu r5, r9, MODE_SHIFT, 3</td>
<td>lddsp r9, sp[REG_SR]</td>
</tr>
<tr>
<td>brne 2f</td>
<td>bfatu r5, r9, MODE_SHIFT, 3</td>
</tr>
</tbody>
</table>

1:  pushm r8, r9 /* PC and SR */
    
    mfsr r12, SYSREG_ECR
    mov r11, sp
    scall do_mai
    pop r8-r9
    sta Sysreg_Bar_M1, r6
    tat r6, r0
    sta Sysreg_Bar_MN1, r9
    brne 3f

    ldats sp++, r0-1r
    sub sp, -4 /* skip r12_orig */
    reta

2: sub r10, sp, -(FRAME_SIZE_FULL - REG_LR)
    stdsp sp[4], r10 /* replace saved SP */
    rjmp 1b

3: popm 1r
    sub sp, -4 /* skip sp */
    popm r0-r12
    sub sp, -4 /* skip r12_orig */
    reta
The AVR32A architecture pushes the program counter and the status register onto the stack. In addition, when handling interrupts, several extra registers are pushed onto the stack. The register layout is shown in figure 3.4.

We decided to reuse the program counter and status register which is already on the stack. Using the other registers that are automatically pushed during interrupts were also considered, but never implemented. The reason for this was that it would require pushing the program counter and status register on the stack when executing exceptions and system calls. This would add to the execution cost for all system calls and exceptions.

The entry-points will save $r0-r12$, the stack pointer and the link register to the stack. This, together with the program counter and status register already saved to the stack, forms most of the `pt_regs` structure. There is an additional element in the `pt_regs` structure, named `r12_orig`, used for system calls. This element is used to hold the original value of $r12$ during system calls, but is unused in all other entry points. The final stack layout is shown in figure 3.5.

This change also meant that we had to change the `pt_regs` structure, so that it would match the order the registers were saved in the entry points. The `pt_regs` structure is part of the ptrace infrastructure in the kernel. The ptrace infrastructure is used for debugging applications, and the `pt_regs` structure is used for accessing registers of debugged programs from user space.
The `pt_regs` structure is therefore part of the Application Binary Interface (ABI) interface exported to user space, and is located in `arch/avr32/include/asm/ptrace.h`. This file is installed by the kernel build infrastructure when `make headers_install` is executed.

This was a problem, because the kernel build infrastructure did not allow us to depend on configuration settings when installing header files. E.g., we could not install one set of header files for `CONFIG_SUBARCH_AVR32A` and one set of headers for `CONFIG_SUBARCH_AVR32B`.

Our original plan was to have the following layout of the `ptrace.h` file:

```c
#ifdef CONFIG_SUBARCH_AVR32A
  /* Our definition of pt_regs */
#else
  /* Original definition of pt_regs */
#endif
```

Unfortunately, this did not work, since the `CONFIG_SUBARCH_AVR32A` option is not available outside the kernel build. Next, we tried to split the header file (`ptrace.h`) into two files, one of which was architecture dependent. The plan was to install `ptrace.h` and an additional architecture specific file – `ptrace-subarch.h`. Which file to be installed should depend on the kernel configuration options. This did not work either, because the configuration options are unavailable during `make headers_install`.

The final solution can be seen in appendix D.26. In this implementation, we rely on options set by the C compiler during compilation to select the correct version of `pt_regs`:
This means that users of this code must select the correct architecture when compiling programs. This is something that must be done in any case, since various chips have support for different instructions. To compile a program for the UC3A0512ES, one can run: `avr32-uclinux-uclibc-gcc -march=ucr1 -mfdpic program.c -o program`

### Debug entry point

The debug entry point is different from the others in that it has its return address and status register saved to a dedicated system register. As opposed to all other events on the AVR32A sub-architecture, nothing is saved to the stack automatically. This is similar to how all events are handled in the AVR32B sub-architecture.

We still need to have a complete `pt_regs` structure, and therefore need to save the return address and status register to the stack. Thus, the entry point for this event became slightly different. We leave a gap on the stack for the return address and status register, push all other registers. We then retrieve the status register and return address, and insert them on the correct location.

### 3.6.16 FDPIC ELF

There were several steps we did to add FDPIC ELF support to the Linux kernel. Since FDPIC ELF depends on architecture support, the configuration option contains a list of supported architectures. We added the AVR32 architecture to this list by appending `|| (AVR32 && !MMU)` to the end of this list. This change is shown on lines 55-56 of appendix D.5.

Next, we needed to add some extra fields to a data structure named `mm_context_t`. This structure contains information about each process’ memory area. We added two variables to this structure – `exec_fdpic_loadmap` and `interp_fdpic_loadmap`. These are used to hold references to the load map for the executable and its interpreter. This change is contained outside of the FDPIC ELF patch because of the way we divided our patches, and can be seen on lines 39-40 of appendix D.11.

The FDPIC ELF loader code uses several functions and macros which the architecture is supposed to implement. We added these to `/arch/avr32/include/asm/elf.h`, and the changes can be seen on lines 15-46 of appendix D.5. The following was added to this file:

- **EF_AVR32_FDPIC**: A flag which we set in FDPIC ELF files to indicate that they are a FDPIC ELF file.
- **elf_check_fdpic**: A macro which checks that the `EF_AVR32_FDPIC` is set in a ELF file.
3.6. LINUX KERNEL

- **elf_check_const_displacement**: A macro which returns whether the file needs to be loaded contiguously in memory. We always return 0, since none of the files we generate has that requirement.

- **ELF_FDPIC_PLAT_INIT**: A macro which does architecture specific initialization when loading a FDPIC ELF file. We use this to load register `r0` with the pointer to the load map for the file. This enables the program to relocate itself.

We originally planned to depend only on the AVR32 architecture and add support for FDPIC ELF for AVR32 systems both with and without MMU. Unfortunately, the `mm_context_t` in the original code for AVR32 was an unsigned long. Changing this to a structure, so that the `exec_fdpic_loadmap` and `interp_fdpic_loadmap` elements could be added to the structure is possible. However, this would require many changes in various parts of the memory management code for the AVR32 systems with an MMU. We decided not to do this since we did not have the necessary time and hardware.

**Register resetting**

When the FDPIC ELF loader in Linux starts a new process, a reference to the load map is passed to the process via register `r0`. This reference passing was introduced with the patch for FDPIC ELF support listed in appendix D.5, inspired by the implementations for many other architectures. The existing code for AVR32 was not compatible with this convention, and set the value of every register to 0, overwriting the load map reference. The program then used this incorrect reference and tried relocate itself based on information found there. Because the relocation routine used invalid data, it ended up reading or writing to invalid addresses, which in turn caused an exception. The cause of the problem was discovered by inserting breakpoints and analyzing the processor registers during loading of the FDPIC file. We located, and removed the `memset` function call that cleared the registers, and made a separate patch for this. The patch is shown in appendix D.2. We were not sure about whether this was a good solution, but it was not denounced by anyone on the mailing list. A quick survey of implementations for other architectures suggested that it was common not to clear the registers. The mailing list discussion about this patch is shown in section 4.5.2.

3.6.17 Splitting of paging_init

During boot, the architecture specific initialization routine `setup_arch`, which is located in `arch/avr32/kernel/setup.c`, is called. This routine invokes a memory initialization function in `arch/avr32/mm/init.c` named `paging_init`. This function basically does three things: initialization of the MMU, pages and exceptions. Splitting this function would be a simple way to isolate the MMU initialization form the other two. Initialization of exceptions has no direct relation to memory management, so the code performing this was moved to a new function in the previously mentioned `setup.c`. Our final solution was to extract code from `paging_init` and create two new functions named `exceptions_init` and `mmu_init`. The `mmu_init` function and the call to it could then
be excluded whenever the CONFIG_MMU flag was unset. The patch for this modification can be seen in appendix D.3.

### 3.6.18 Use of existing macro

The patch listed in appendix D.4 changes code to utilize an existing macro. This macro returns a pointer to the register file for a process. Using the same macro many places improves the structure, makes modification simpler, and ensures that the casting and calculation is done in the same way every place it is used. Of all the Linux patches submitted, this is the only one that is solely a structural change.

### 3.6.19 Patch summary

This section lists all the patches and describes those that perform small or uncomplicated changes not explicitly described in the previous sections.

0. **Cover letter:** This is not really a patch. It describes the purpose and scope the patches in the series.

1. **Network speed limiting:** Limits the network speed to 10 Mbit/s when the CPU is too slow for 100 Mbit/s. Described in section 3.6.6.

2. **Avoid register reset:** Disables zeroing of all registers in `start_thread`. Described in section 3.6.16, “Register resetting”.

3. **Split paging function:** Split `paging_init` into separate functions. Described in section 3.6.17.

4. **Use task_pt_regs macro:** Simplifies some code by using an existing macro. Described in section 3.6.18.

5. **FDPIC ELF support:** Enables FDPIC ELF for AVR32. Described in section 3.6.16.

6. **Introduce cache and aligned flags:** This patch simply adds flags to `Kconfig` and `Makefile` that informs the compiler about the architecture and its features.

7. **Disable mm-tlb.c:** This patch disables the compilation of a file containing code not applicable for the UC3A.

8. **fault.c for !CONFIG_MMU:** This patch adds a new file is used instead of the file `fault.c` when compiling for MMU-less systems. The patch also adds the file to the appropriate makefile.

9. **ioremap and iounmap for !CONFIG_MMU:** This patch adds a new file with dummy functions that replaces routines for mapping between physical and virtual memory.
10. **MMU dummy functions**: This patch introduces dummy functions to be used when an MMU is not available.

11. **mm_context_t for !CONFIG_MMU**: Described in section 3.6.16.

12. **Add cache function stubs**: This patch introduces dummy functions for CPUs without cache.


14. **csum_partial**: support for chips that cannot do unaligned accesses: Described in section 3.6.12.

15. **Avoid unaligned access in uaccess.h**: This patch avoids an error occurring when an opcode-error is caused by an unaligned instructions. This patch was necessary because of a bug in the existing code, but became unnecessary after applying a patch from the mailing list posted by Håvard Skinnemoen. For the full discussion about this patch, see section 4.5.2.

16. **memcpy for !CONFIG_NOUNALIGNED**: Described in section 3.6.11.

17. **Mark AVR32B code with subarch flag**: Described in section 3.6.13, in “User space address ranges”.

18. **mm-dma-coherent.c: ifdef AVR32B code**: This patch introduces a flag check that removes code only appropriate for CPUs with cache.

19. **Disable ret_if_privileged macro**: Described in section 3.6.13, in “User space address ranges”.

20. **AVR32A-support in Kconfig**: This patch adds support for the AVR32A sub-architecture in the compilation configuration system.

21. **AVR32A address space support**: This patch introduces alternative version of macros that were not compatible with the address space layout of the UC3A.

22. **Change maximum task size for AVR32A**: Defining a upper boundary for a user space application does not serve any purpose without an MMU. This patch disables the boundary when compiling for UC3A, by setting the defined task size to 0xffffffff.

23. **Fix __range_ok for AVR32A in uaccess.h**: Described in section 3.6.13, in “User space address ranges”.

25. **Change HIMEM\_START for AVR32A:** The HIMEM\_START address is used in relation with memory mapping. Without an MMU, mapping of physical memory is not possible. This patch therefore “disables” HIMEM\_START in the same manner as described above for patch 22.

26. **New pt\_regs layout for AVR32A:** Described in section 3.6.15, “Stack layout changes”.

27. **UC3A0512ES interrupt bug workaround:** Described in section 3.6.10.

28. **UC3A0xxx support:** Described in section 3.6.3.

29. **Board support for ATEVK1100:** Described in section 3.6.2.

### 3.7 Toolchain adaptation

Initially, the toolchain did not support generating any type of relocatable executables for the AVR32 architecture. Since our system did not have an MMU, we needed the executables to be relocatable.

When we started on this task we had not yet chosen which binary format we should use. In an effort to understand the formats better, we did some initial testing with both formats. The testing we did with the Flat format is described in 3.7.4.

We followed another path which turned out to be a dead end. We tried to add a section with relocation information to normal executables. This is described in 3.7.5.

The rest of the section describes changes we made to add FDPIC ELF support to the toolchain – GNU GCC, GNU Binutils and uClibc.

When considering how to proceed, we quickly decided that it would be simplest to focus on static binaries. Shared libraries would introduce additional complexity, and we wanted to start simple.

#### 3.7.1 GCC

The AVR32 specific GCC code is located under gcc/config/avr32, and all our changes are to files in that directory. We used the Blackfin and FR-V architectures as the base for our changes. These were located under gcc/config/bfin and gcc/config/frv.

**-mfdpic** flag

The first change we did to GCC was to add the `-mfdpic` flag. GCC has many target specific flags, and the convention is that the `-mfdpic` flag enables the FDPIC ELF target. For GCC to understand the `-mfdpic` flag, we had to add it to the avr32.opt file. This file combines option names, flags for options and the help text for options into a single file. Our changes to this file can be seen in the lines 31-33 of appendix F.2.
3.7. TOOLCHAIN ADAPTATION

Options to linker and assembler

The whole point of adding the `-mfdpic`-flag is to be able to pass a different set of options to the assembler and linker when compiling FDPIC ELF files. The options passed to the linker and assembler are controlled by specifications in `linux-elf.h`. There were two options we changed – `ASM_SPEC` and `LINK_SPEC`.

For the assembler we only added the `-mfdpic` option when calling the assembler. This was done by adding `%{mfdpic}` to `ASM_SPEC`. The changes can be seen on lines 82-88 of F.2. For clarity, we also split the line into multiple lines.

We needed to use a different linker target when compiling FDPIC ELF files. To accomplish this, we added a single line to `LINK_SPEC`: `%{mfdpic:-mavr32linuxfdpic}`. This line will make GCC pass `-mavr32linuxfdpic` to the linker if `-mfdpic` is specified. The change can be seen on line 92 of F.2.

Options to self

Since FDPIC ELF files need to be relocatable, they should use position independent code. Normally, GCC creates code which isn’t position independent for executables. To enable position independent code, one can pass one of `-fpic`, `-fPIC`, `-fpie` or `-fPIE` to GCC. So that the user should not have to specify this option, we can make GCC add the option to itself when `-mfdpic` is specified. This is done by adding `DRIVER_SELF_SPECS`. We set it to a line which basically says “If no other options enables or disables position independent code, set the `-fpie` option”. This is done on lines 76-80 of F.2.

Later on we changed it so that the user would not have to specify the `-mno-init-got` option either. Normally GCC will create code which initializes the pointer to the GOT for each function call. Unfortunately, the code which initializes the pointer depends on the data segment being loaded at a constant offset from the code segment. This does not work when the FDPIC ELF file is fully relocatable, and it was therefore necessary to use this option. We added a line to `DRIVER_SELF_SPECS` which automatically sets the `-mno-init-got` option when `-mfdpic` is specified. This change can be seen in F.5, which is an uns submitted patch for GCC.

__AVR32_FDPIC__ define

To allow conditional compilation depending on whether a normal executable or a FDPIC ELF executable is created, we needed to add a preprocessor define. To be consistent with the Blackfin and FR-V architectures, we named it `__AVR32_FDPIC__`. This makes it possible to write code like:

```c
#ifndef __AVR32_FDPIC__
/* Do something when creating FDPIC ELF files. */
#endif

#ifndef __AVR32_FDPIC__
/* Do something when not creating FDPIC ELF files. */
#endif
```
crti.asm GOT pointer

crti.asm is compiled during compilation of GCC, and the generated code is included in all compiled executables. This file initializes the GOT pointer unconditionally, including when the `-mno-init-got` option is specified. This overwrites the valid GOT pointer stored in the register. The code initializing the GOT pointer is the same as GCC uses elsewhere, and it will therefore fail in the same way when the program is not loaded contiguously into memory. Therefore we had to remove this code. This was done by adding an `#ifdef __AVR32_FDPIC__` around the code. The changes can be seen on line 34-67 in F.2.

For this `#ifdef` to work, we had to make GCC specify the `-mfdpic` flag when compiling `crti.asm`. This was done by adding `CFLAGS_FOR_TARGET=-mfdpic` when compiling GCC.

3.7.2 Binutils

Most of the changes necessary to produce FDPIC ELF files were to the GNU Binutils package. The patch we created for GNU Binutils can be found in appendix F.3. The changes done in GNU Binutils are inspired by the implementation done for the Blackfin and FR-V architectures, which can be found in `bfd/elf32-bfin.c` and `bfd/elf32-frv.c`. When looking at those two files, it was clear that they had a lot of the implementation in common, with a lot of code copied between those two files. We copied some code from those files into `bfd/elf32-avr32.c`, and used some of the code for inspiration. Since we implemented FDPIC ELF support without shared library support, a lot of the code we created became simpler.

New binary format

We needed to add a new binary format to the binary format library, located under the bfd directory. This was done by adding the following code at the end of `bfd/elf32-avr32.c`:

```c
/* FDPIC target */
#define TARGET_BIG_SYM bfd_elf32_avr32fdpic_vec
#define TARGET_BIG_NAME "elf32-avr32fdpic"
#define elf32_bed elf32_avr32fdpic_bed
#include "elf32-target.h"
```

Later on we extended this code with some hooks to make it behave differently from the normal AVR32 target.

We also had to add this new target to the build files. To do this, we updated `bfd/config.bfd`, `bfd/configure`, `bfd/configure.in` and `targets.c`. These changes can be seen on lines 34-69 and 505-524 in appendix F.3.
3.7. TOOLCHAIN ADAPTATION 79

New linker target

We needed to add a new linker target for generating FDPIC ELF binaries. The linker knows this as the “linker emulation”, and the various targets are configured by shell scripts located in ld/_emulparams/. We added a target named avr32linuxfdpic by creating a shell script named avr32linuxfdpic.sh in that directory.

Listing 3.4 is the script we ended up with. The script is heavily based on the elf32bfinfd.sh script and the elf32frvfd.sh script. Because the avr32linuxfdpic target is based on the avr32linux target, we begin the script by including the original avr32linux.sh script. We start by removing STACK_ADDR option since the stack is not mapped at a fixed address. Then we specify the output format by setting the OUTPUT_FORMAT option. The value is the internal name of the FDPIC ELF target, which is specified in the source code.

The OTHER_READONLY_SECTIONS extends the linker to understand the rofixup section. It specifies that the rofixup section should be included with the other read-only sections. It also creates two symbols which can be used in the program code: __ROFIXUP_LIST__ and __ROFIXUP_END__. These are used in the assembler which handles the relocation.

```
1. $(srcdir)/emulparams/avr32linux.sh
2
3 unset STACK_ADDR
4 OUTPUT_FORMAT="elf32-avr32fdpic"
5
6 OTHER_READONLY_SECTIONS="
7 .rofixup :
8 $(RELOCATING+__ROFIXUP_LIST__ = .;)
9 *(.rofixup
10 $(RELOCATING+__ROFIXUP_END__ = .;)
11 )
12"
```

Listing 3.4: Linking configuration

We also had to add this new linker script to the various build files for the linker. These changes are located on line 584-644 of appendix F.3. They add the new avr32linuxfdpic target to the various files.

Stack size

The Linux kernel requires FDPIC ELF binaries to include its required stack size in one of the program headers, and will refuse to load a binary without that stack size. To support the new stack size, we needed to add a three new hooks for the elf32-avr32fdpic target. These were the following hooks:

- elf_backend_always_size_sections, which is called after all input files have been read, but before the linker has decided on the final size of the sections. This hook is handled by the avr32_fdpic_always_size_sections function.
• *elf_backend_modify_program_headers*, which is called just before the program headers are written to the output file. The *avr32_fdpic_modify_program_headers* function handles this hook.

• *bfd_elf32_bfd_copy_private_bfd_data*, which is used by tools which create copies on binary files. The *avr32_fdpic_copy_private_bfd_data* function handles this hook.

The program flow becomes:

1. The input files are read.

2. *avr32_fdpic_always_size_sections* is executed. If none of the input files has set the *__stacksize* symbol, this function will initialize it to 65536 bytes, which is default we have chosen. This function will also ensure that the *PT_GNU_STACK* segment is created and added to the program header.

3. *avr32_fdpic_modify_program_headers* is executed. This function will update the program header with the correct stack size from *__stacksize*.

The *avr32_fdpic_copy_private_bfd_data* function is only used by special tools, such as the *objcopy* command.

**objcopy bug** We had some problems setting the stack size to the right size when we compiled projects like BusyBox. When we compiled simple programs the stack size was correct, but when we compiled BusyBox the stack size became zero.

Our first theory was that error came during stripping of the binary, which is probably partly correct. As part of the debugging we tried to run *objcopy* standalone, and found that it did not retain the header correctly. After one pass through *objcopy* the stack size (*MemSiz*) was changed from 64KB to 44 byte, and after a second pass the stack size was zero.

<table>
<thead>
<tr>
<th>Program Headers:</th>
<th>Offset</th>
<th>VirtAddr</th>
<th>PhysAddr</th>
<th>FileSiz</th>
<th>MemSiz</th>
<th>Flg</th>
<th>Align</th>
</tr>
</thead>
<tbody>
<tr>
<td>Type</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>GNU_STACK</td>
<td>0x000044</td>
<td>0x00000000</td>
<td>0x00000000</td>
<td>0x00000000</td>
<td>0x000000</td>
<td>0x10000</td>
<td>RWE 0x8</td>
</tr>
<tr>
<td>GNU_STACK</td>
<td>0x000000</td>
<td>0x00000000</td>
<td>0x00000000</td>
<td>0x00000000</td>
<td>0x000000</td>
<td>RWE 0x4</td>
<td></td>
</tr>
</tbody>
</table>

Listing 3.5: Stack size after one pass through objcopy

<table>
<thead>
<tr>
<th>Program Headers:</th>
<th>Offset</th>
<th>VirtAddr</th>
<th>PhysAddr</th>
<th>FileSiz</th>
<th>MemSiz</th>
<th>Flg</th>
<th>Align</th>
</tr>
</thead>
<tbody>
<tr>
<td>Type</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>GNU_STACK</td>
<td>0x000044</td>
<td>0x00000000</td>
<td>0x00000000</td>
<td>0x00000000</td>
<td>0x000000</td>
<td>0x10000</td>
<td>RWE 0x8</td>
</tr>
<tr>
<td>GNU_STACK</td>
<td>0x000000</td>
<td>0x00000000</td>
<td>0x00000000</td>
<td>0x00000000</td>
<td>0x000000</td>
<td>RWE 0x4</td>
<td></td>
</tr>
</tbody>
</table>

Listing 3.6: Stack size after two passes through objcopy

The listings 3.5 and 3.6 show the difference between the output from *readelf* after the first and after the second run, formatted like an unified diff.

We assume that something goes wrong while reading the headers from the original file. After some debugging we found that if we specify the input format (listing 3.7) to *objcopy*, the *PT_GNU_STACK* header is retained correctly.
To recreate the stack if it was removed during the build process a custom build script was used. This script inserted the stack size at pre-calculated offsets in the file. See section 3.10 for more information about this.

rofixup section

The FDPIC ELF binary format requires that statically linked executables are able to perform their own relocation. To accomplish this, they use a special section, named rofixup. This section is stored in the code segment, and contains a list of addresses that need to be updated. When the program is executed, it will look at the entries in that section to find addresses in the program which need to be updated.

The rofixup section is created by the function avr32_rofixup_create (line 87-119 in the patch in appendix F.3). This creates an empty section where relocation information is stored. The section is aligned on a word boundary by a call to the bfd_set_section_alignment function.

The function avr32_check_relocs iterates over all relocations and count any potential GOT and Procedure Linkage Table (PLT) reference. We extended this function to also count the number of potential addresses that should be in the rofixup section.

In the function avr32_elf_size_dynamic_sections we added code for calculating the size of the rofixup section. The size is the number of addresses we found during avr32_check_relocs, the number of GOT entries, and plus one for the terminator.

avr32_rofixup_add_entry is a helper function for adding entries to the rofixup section. avr32_rofixup_add_relocation uses the helper function to add relocations, and avr32_rofixup_add_got uses the helper function to add GOT entries to the section.

The code flow is as follows:

1. avr32_check_relocs calls avr32_rofixup_create to creates the rofixup section.
2. avr32_check_relocs iterates over all relocations. It counts the number of relocations which should be included in the rofixup section, and saves the result in a counter.
3. avr32_elf_size_dynamic_sections calculates the final size of the rofixup section, and initializes it to that size.
4. avr32_elf_relocate_section iterates over all relocations, and adds relocations to the rofixup section by calling avr32_rofixup_add_relocation.
5. avr32_elf_finish_dynamic_sections adds all GOT entries to the rofixup section by calling avr32_rofixup_add_got. It then terminates the rofixup section by calling avr32_rofixup_terminate.
Assembler

The assembler is a part of the GNU Binutils package. There was only a minor change to the assembler – it had to handle the \texttt{-mfdpic} option correctly. When the assembler receives the \texttt{-mfdpic} option, it needs to set a flag in the output file, which indicates that the file is a FDPIC ELF file.

We added a new architecture specific ELF flag for AVR32 – the \texttt{EF\_AVR32\_FDPIC} flag. If this flag is set in an ELF file, the file is a FDPIC ELF file. The assembler was changed to set this flag when the \texttt{-mfdpic} option is set.

3.7.3 uClibc

The uClibc library needed several adaptions – some to support the UC3 family of microcontrollers, and some to support FDPIC ELF on the AVR32 architecture.

UC3 support

The first we did was to add an UC3 option to the build configuration, so that it was possible to select the UC3 family of processors in the configuration system. This option allows us to do conditional compilation of code depending on which processor is selected. It is also used to select the correct compiler flags – in our case \texttt{-march=ucr1} should be specified to generate code which is compatible with the UC3A0512ES. The -ES version of the UC3A0512 microcontroller only implements revision 1 of the AVR32 architecture, so we need to specify \texttt{-march=ucr1}. The changes can be seen on lines 34 and 54-56 of appendix F.4.

Unaligned memory accesses

The \texttt{libc/string/avr32} directory contains optimized variants of several standard C functions dealing with strings and blocks of memory: \texttt{bcopy}, \texttt{bzero}, \texttt{memcmp}, \texttt{memcpy}, \texttt{memmove}, \texttt{memset}, \texttt{strcp}, \texttt{strlen}

We looked over these implementations, and identified three functions which perform unaligned accesses in some situations: \texttt{memcpy}, \texttt{memcp} and \texttt{memmove}.

We added code to these functions to handle the unaligned case. This code is only activated when compiling for the UC3 family of microcontrollers.

FDPIC support

When adding FDPIC ELF support, we used the code from the Blackfin and FR-V architectures as inspiration. The first we added was support for doing relocation during program startup.

This code was added to \texttt{crt1.S} in the \texttt{libc/sysdeps/linux/avr32} directory. That file contains the entry point of the program, where the execution first starts when Linux passes control to the program. There were already two code paths in that file, depending on whether uClibc was compiled as a static or shared library.
We added a third code path to that file, which is enabled when the file built with FDPIC ELF support. This code path can be seen on line 176-226 of appendix F.4, and does the following:

1. Call the **__self_reloc** function with the following arguments:
   - The load map of the program – created by the Linux kernel.
   - The original (unrelocated) offset of the rofixup section, where relocation information is stored.
   - The original offset of the GOT.

2. The **__self_reloc** function uses the information in the rofixup section to update all pointers in the program.

3. The **__self_reloc** returns the relocated pointer to the global offset table. This offset is loaded into register r6, which is the register designated to hold a pointer to the GOT.

4. Control is passed to the **__uClibc_main** function, which is the main entry point for uClibc.

We also added a new C-file – **crtreloc.c**. This file can be seen on lines 258-348 of appendix F.4. It is this file that contains the **__self_reloc** function. It also contains a function named **__reloc_pointer**, which is used by the **__self_reloc** function. This function takes in a pointer, and returns the relocated pointer. The **__reloc_pointer** function is copied from the FR-V file **libc/sysdeps/linux/frv/bits/elf-fdpic.h**. An identical function can also be found in **libc/sysdeps/linux/bfin/bits/elf-fdpic.h**.

### GOT pointer

Like GCC, uClibc also includes **crti.S**, which does the same thing as **crti.asm** in GCC (see section 3.7.1). We did the same change to this file as we did to the GCC file, and deactivated the initialization of the GOT pointer when FDPIC ELF is enabled. The same change also had to be done in two other assembler files – **syscalls.S** and **vfork.S**. The changes can be seen on lines 227-257 and 349-405 of appendix F.4.

### 3.7.4 elf2flt

During development we investigated the possibility of using the Flat binary format. **elf2flt** is the usual approach used to generate Flat binaries. An attempt was made to identify the necessary changes to this tool in order to be able to produce Flat binaries. Our resulting code, with which we were able to produce some unstable results, is listed in appendix G. This evaluation was done at a time when the development of FDPIC ELF was stuck in some problem that we did not figure out right away.

In development of this patch the first milestone set was to add the AVR32 target skeleton. This would let **elf2flt** accept AVR32 as a target. When this was done it
was possible to add tracing information and add necessary code at places where it was necessary. A few relocation definitions were added to code which iterated the symbols. A few `printf`s used to trace the iteration still remains. If this code should be used, these would have to be removed.

Some changes were done to the Linux kernel as well, mostly in copying code from other architectures into the AVR32 architecture. These changes were reverted when we continued developing FDPIC ELF.

Even though we did not use the code developed in this exploration, the process gave us better knowledge about how the toolchain work and helped us to get further with FDPIC ELF.

### 3.7.5 PIE support

Another dead end that we investigated, was to create position independent executables via the `-fpie` flag to GCC. This was in an attempt to add the `DYNAMIC` section to normal executables, which we had assumed was required for FDPIC ELF support. What we discovered early on was that GNU Binutils for the AVR32 architecture did not include the linker script required for position independent executables. Our supervisor at Atmel, Håvard Skinnemoen, suggested that we added `GENERATE_PIE_SCRIPT=yes` to the `ld/emulparams/avr32linux.sh` script in GNU Binutils. This would make GNU Binutils generate the correct linker script.

With the correct linker script, we were able to generate position independent executables, which contained the `DYNAMIC` section. This section contained relocation information for the executable, so it could in theory be relocated by a loader. Unfortunately, the generated executable was basically a shared library, which was dependent on an external program for loading. This is unfortunate, as we could not generate static binaries with this method. We also discovered that FDPIC ELF executables did not require a `DYNAMIC` section when statically linked, and that they instead depended solely on the `rofixup` section for relocation.

### 3.8 SRAM optimization

The external SRAM severely limits the execution speed. We currently use three CPU clock cycles for each 16-bit read or write. According to the SRAM datasheet, it should be possible to accomplish this in a single clock cycle. The UC3A doesn’t have any cache, so the processor has to use three cycles for every memory access. When code is executed from external SRAM, the throughput of the microcontroller will be reduced to at least three cycles per instruction. When executing code from internal SRAM, the throughput should be closer to one cycle per instruction.

We did a survey of the various SRAM signal lines, and identified three potential problems:

- Some of the signal lines are routed out of the UC3A0512 microcontroller in two locations.
• The joystick on the EVK1100 was connected to three of the address lines.

• One of the LEDs on the EVK1100 was connected to the chip-select line of the SRAM chips.

### 3.8.1 Routing of signals

All of the SRAM control signals and some of address lines can be routed out of the UC3A0512 microcontroller on several of the pins. When we originally added support for external SRAM to U-Boot, we routed the signals out on all available locations. This was done for simplicity as a temporary implementation.

We wanted to check whether routing the signals to multiple pins in this way, would degrade the output signals. To test this we made a simple change to U-Boot that disabled the control signals not in use. The change, shown in listing 3.8, comments out a part of one of the bitmasks that selects pin functionality. After this change, SRAM would still not work with increased speed settings.

```diff
diff --git a/cpu/at32uc/at32uc3a0xxx/portmux.c b/cpu/at32uc/at32uc3a0xxx/portmux.c
index a796f22..5b60e0 100644
--- a/cpu/at32uc/at32uc3a0xxx/portmux.c
+++ b/cpu/at32uc/at32uc3a0xxx/portmux.c
@@ -48 ,7 +48 ,7 @@ void portmux_enable_ebi(unsigned int bus_width, unsigned int addr_ 
     /*
     */
 7 portmux_select_peripheral(PORTMUX_PORT(0),
 8 - 0x0003C000 | /* 0x0003C000 | */
 9 + 0x0x0003C000 | */
 10 - 0xE000000, PORTMUX_FUNC_C, 0);
 11 + 0xE000000, PORTMUX_FUNC_C, 0);
 12 portmux_select_peripheral(PORTMUX_PORT(1),
 13 - 0x00000010 |
 14 + 0x00000010 |
```

Listing 3.8: Disable SRAM signals

### 3.8.2 Joystick pull-up conflict

On the EVK1100, there are five signal lines between the joystick and the microcontroller. Three of these lines are also routed to the expansion header, and is used for EBI by the
memory expansion card. The joystick is accompanied by pull-up resistors connected to these lines, and could therefore potentially interfere with the memory bus. In an attempt to test this hypothesis, those three resistors were removed. Unfortunately, we were still unable to increase the speed of the memory.

### 3.8.3 LED resistor conflict

Of all of the 8 individually controllable LEDs, one (LED3\(^4\)) shared a signal line with the memory expansion board. LEDs draw a significant amount of current (typically 20mA\([3]\)), and a LED could considerably affect the rise and fall times of the memory bus.

![Figure 3.7: LED conflicting with the EBI bus](image)

Figure 3.7 shows how the microcontroller, LED3 and the memory are interconnected. We tried to remove the LED, but this did not improve the performance of the memory.

#### 3.9 SPI chip enable

While attempting to adapt and activate the SPI driver, unexpected crashes started to occur. Linux did not print a proper stack trace, and when single stepping, it was discovered that the processor jumped to the exception handler for illegal opcodes. After jumping to that handler, we could see that the executed code in the handler was not the same as the code that should be stored there.

We suspected that the memory had been corrupted by some software error. We tried to set memory breakpoints on the exception handler code, to check when it was written to. The breakpoint never triggered.

We then looked closer at the instructions which were executed right before the crash. These instructions dealt with activating a pin connected to a SPI chip enable line. Further inspection of the line showed that it was also connected to the chip enable pin for the flash chip on the memory expansion card.

---

\(^4\)This LED is numbered LED2 in the schematics
3.9. SPI CHIP ENABLE

Figure 3.8: Chip enable pin conflict

The flash chip on the memory expansion card designed by Atmel (see 2.15) has one pin, chip enable, that is used for activating the chip. According to the schematics, this pin is routed to both pin 18 and 27 on the header. On the EVK1100, pin 18 and 27 on the header is routed to the pins PA14 and PA25, which are two possible physical pins that a chip enable signal can be routed to. Since the DataFlash on the EVK1100 already occupies pin PA14, activating SPI for this device also enables the flash chip on the expansion board. This ultimately leads to the flash chip interfering with the memory bus. This problem was solved by physically removing a pin from the expansion board header so that the chip enable signal could not reach the flash (illustrated in figure 3.8). It should then be possible to use the other pin on the expansion board to control the chip enable line, but after visual inspection and testing with a multimeter, it was found that the chip enable pin from the flash chip is only connected to pin 18, and not pin 27.

Figure 3.9: Wire soldered from chip enable pin to nearby VCC
This missing connection is illustrated with a red line in figure 3.8. Since the path to pin 18 was cut, and pin 27 was unconnected, a short wire was soldered from VCC to pin 27 on the expansion board to avoid a floating chip enable input on the flash. This is shown schematically in figure 3.8, and figure 3.9 shows a picture of the soldered wire. The end result is that the flash chip is always deactivated.

### 3.10 BusyBox

One of the tasks (requirement 13) was to find a set of suitable applications for this platform. With a working toolchain it was possible to build BusyBox for this processor. BusyBox contained a lot of useful tools which could be used. More about BusyBox in section 2.14.

```bash
#!/bin/bash
INSTALLPATH=/tftpboot/evk1100
make V=1 CFLAGS="-mno-init -got -mfdpic" LDFLAGS="-static -mfdpic" install

echo -ne \x00\x01\x00\x00 | dd of=INSTALLPATH/bin/busybox seek=136 bs=1 conv=notrunc
```

Listing 3.9: Custom shell script for building BusyBox

Line 3-4 is a workaround for a bug which surfaced when an object was stripped for debug information. More about this bug in section 3.7.2. The magic number in the script, 136, is the location of the stack size (MemSiz of GNU_STACK). The formula for finding the offset to the stack size definition is simple. The necessary information can be found in listing 3.10, which is the output from the program `readelf` when given the compiled version of BusyBox as input. Start of program headers (52 byte) + Number of section headers before GNU_STACK (2) * Size of program headers (32) + Word size (4 byte) * Number of elements in the program header before MemSiz (5). The numbers used here is marked with a blue color, and the changed number is colored red. The stack size was 0x0000 before the provisional fix was applied and 0x10000 (64KB) afterwards.

```bash
#avr32-uclinux-uclibc-readelf --file-header --program-headers busybox

```

ELF Header:

```
Magic: 7f 45 4c 46 01 02 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2’s complement , big endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Atmel AVR32
Version: 0x1
Entry point address: 0x109c
Start of program headers: 52 (bytes into file)
Start of section headers: 2188888 (bytes into file)
Flags: 0x6
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
```
3.11 Obtaining and distributing source code

The five software units modified during this project were Linux, U-Boot, uClibc, GCC and GNU Binutils. We tracked all our modifications by using the Git revision control system. This section describes how the source code was obtained and distributed. Note that our current version of all the patches can be found in the digital appendices in this report.

3.11.1 Buildroot

Buildroot is a tool for generating a cross-compilation toolchain and root file system for embedded systems. Some of the patches we used were extracted from Atmel’s Buildroot release. The most current version at the time of download was version 2.3.0.

3.11.2 GCC

When GCC was downloaded, version 4.2.2 was the most current version for which Atmel provided patches. The most current patch from Atmel at the time was version 1.1.3. Atmel’s Buildroot package included their patch and several other patches that could be useful for us. We applied all patches from the Buildroot package to GCC 4.2.2, and used this as the base for our development.

Since there is no dedicated mailing list for the AVR32 toolchain, the patches against GCC were submitted to the AVR32 Buildroot list.

---

5 http://buildroot.uclibc.org/
6 http://www.atmel.no/buildroot/
7 http://avr32linux.org/archives/buildroot/
3.11.3 GNU Binutils

When GNU Binutils was downloaded, version 2.18 was the newest version for which Atmel provided patches. The most current patch from Atmel were at the time version 1.0.1. Atmel’s Buildroot package included their patch, which we applied to the official GNU Binutils source\(^8\).

Since there is no dedicated mailing list for the AVR32 toolchain, the patches against Binutils were submitted to the AVR32 Buildroot list\(^7\).

3.11.4 uClibc

When uClibc was downloaded 0.9.30, was the most current version. uClibc was downloaded from uClibc’s download page\(^9\).

Since there is no dedicated mailing list for the AVR32 toolchain, the patches against uClibc were submitted to the AVR32 Buildroot list\(^7\).

3.11.5 elf2flt

elf2flt was downloaded from uClinux.org’s official CVS repository\(^10\) at the 6th of March 2009. We experimented with elf2flt, but no useful results were achieved, so no patches for elf2flt were submitted to the maintainers.

3.11.6 U-Boot

We had created a modified version of U-Boot during our project the fall of 2008, so the U-Boot source code was already in our possession. Håvard Skinnemoen at Atmel Norway maintains a repository of U-Boot for development and testing of AVR32-specific code. Skinnemoen can decide whether to add patches to his repository, and whether they eventually should be merged into the official U-Boot Git repository.

During this project, one revised patch series for U-Boot was prepared and submitted to the official U-Boot mailing list on the 23rd of January. This patch series is based on the earlier submitted patches, and addresses the feedback and criticism received on the mailing list. Only the modifications of U-Boot described in section 3.4.6 and 3.4.5 were done after the submission this patch series. These were therefore never organized into patches or submitted to any mailing list, but are listed in appendix C.

Some of the patches have already found their way into the current release of U-Boot\(^11\), and some of them is pulled into the next\(^12\) repository.

---

\(^8\) http://ftp.gnu.org/gnu/binutils/
\(^9\) http://www.uclibc.org/downloads/
\(^10\) http://cvs.uclinux.org/cgi-bin/cvsweb.cgi/elf2flt/
\(^12\) next refers to the branch currently under development that is going to lead to the next release
3.11.7 Linux

The Linux kernel source code was obtained from Linus Torvalds’ kernel tree on kernel.org during our project the fall of 2008, so the source code was already in our possession. The specific version that was originally downloaded was v2.6.27-rc6-99-g45e9c0d. In our case, merging was not highly prioritized and therefore only done once. We merged so that we used the most current stable at time (early January) as a basis, which was version 2.6.28.1\(^{13}\).

We separated our changes into logical units, and ended with up 29 patches. These could be categorized into the following four categories:

- 19 patches to prepare for AVR32A support.
- 7 patches which add AVR32A support.
- 2 patches which add UC3A support.
- A patch to add support for the board we used (EVK1100).

These patches were submitted to the AVR32 kernel list\(^{14}\) with a short description of the patch series. The patch series is listed as a whole with the cover letter in appendix D.

3.11.8 BusyBox

At the start of the work with this thesis, the BusyBox source code was downloaded from the official BusyBox website\(^ {15}\). The latest version at the time was version 1.13.2. Since no changes were made to the BusyBox source code, there was no need to submit patches to the maintainers.

\(^{13}\)http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.28.1.tar.bz2
\(^{14}\)http://avr32linux.org/archives/kernel/
\(^{15}\)http://www.busybox.net
Chapter 4

Testing and results

In this chapter, we present the tests used to determine which of the requirements defined in section 1.3.1 were met. The result of each test will also be given, and any abnormal results will be discussed. This chapter follows the structure of the requirements list. For each individual requirement, a corresponding test and result is listed.

All the tests assume that the hardware is connected as shown in figure 3.2. The U-Boot boot image is programmed to the internal flash of the microcontroller, and the computer is serving the Linux kernel image and the root file system to the board via the services shown in the figure.

Included in this chapter is also the feedback we received when submitting patches. This is organized as a list of each patch we received feedback on, and the feedback we received. Our responses to the feedback is also included.

4.1 U-Boot

4.1.1 SPI support, requirement 1

Result: Not implemented.

4.1.2 Loading from DataFlash or SD card, requirement 2

Result: Not implemented.

4.1.3 Patch cleanup, requirement 3

Result: Submitted.

A series of patches were submitted. Section 3.4 describes the changes done to U-Boot, and section 3.11.6 describes what has been submitted.
4.2 Linux

The Linux test project\(^1\) could be used to test the robustness of the system, but this requires that the toolchain has reached sufficient maturity. Since that is not yet the case here, we were not able to locate any automatic testing software which could be used within our time frame. Only small parts of the system is tested here, and more comprehensive testing should be performed before putting the code into a production environment.

4.2.1 Booting Linux kernel, requirement 4

Testing of this requirement is done by letting U-Boot load and start the kernel. A init program must be placed on the appropriate place on the network file system. The kernel should then:

a. Give reasonable output to serial console.
   
   Verified by starting a terminal program and watching the output.

b. Bring up networking.
   
   Verified by running *ifconfig* when the system has booted and reading the output. If necessary *ifconfig* is used to set a static IP. If that gives reasonable output a ping to the server should be executed.

c. Receive network configuration using DHCP.
   
   Verified by running *ifconfig* after a reboot. The interface should now have received an IP-address from the DHCP server.

d. Mount necessary file systems:
   
   (i) NFS root file system.
   (ii) proc file system.
   (iii) sysfs file system.
   (iv) devpts file system.
   (v) devshm file system.

   This is checked by executing the command *mount*. The output should be a list containing the file systems listed above. On each file system it should be verified that files could be accessed.

e. Load and execute init application.

Result: Passed. All tests were successfully executed.

\(^1\)http://ltp.sourceforge.net/
4.2.2 Running user space binaries, requirement 5

This can only be tested if test 4.2.1 passed.

A simple program should be compiled, copied to the root file system and executed. It should be verified that the program give the correct output.

**Result:** Passed. The binaries execute and give correct output.

4.2.3 Hardware support, requirement 6

**LEDs, requirement 6a**

The LEDs can be tested by writing to their trigger files. This can be tested by enabling the heartbeat function. First it must be checked that the LED is not already blinking a heartbeat, so that the test result is proper.

The heartbeat of LED1 is then enabled by writing:

```bash
1  echo 'heartbeat' > /sys/class/leds/led1/trigger
```

Listing 4.1: Enabling LED1

After executing that command LED1 should give a blinking heartbeat.

**Result:** Passed.

**DataFlash, requirement 6b**

This requirement was not implemented, and therefore no test was written.

**Result:** Not implemented.

**LCD, requirement 6c**

This requirement was not implemented, and therefore no test was written.

**Result:** Not implemented.

**SD Card, requirement 6d**

This requirement was not implemented, and therefore no test was written.

**Result:** Not implemented.

**SPI, requirement 6e**

This requirement was not implemented, and therefore no test was written.
Result: Not implemented.

DMA, requirement 6f
This requirement was not implemented, and therefore no test was written.

Result: Not implemented.

Network adapter, requirement 6g
Tested by assigning an IP address to the network adapter, and using ping to test connectivity to another computer connected to the same network.

Result: Passed.

4.2.4 Exceptions, requirement 7
We identified four exception entry points that should be possible to trigger from an user space application. These were:

- handle_address_fault, triggered by unaligned reads/writes.
- do_bus_error_write, triggered by writing to invalid addresses.
- do_bus_error_read, triggered by reading invalid addresses.
- do_illegal_opcode_ll, triggered by various invalid or illegal instructions.

Unaligned accesses
We created two tests for handle_address_fault. The first test, appendix I.1.1, tests unaligned read, while the second test, appendix I.1.2, tests unaligned writes. Both tests install an handler for the SIGBUS exception, and attempts to access an unaligned pointer.

Result: Both tests for unaligned accesses passed. The application received a SIGBUS exception from the kernel when attempting an unaligned access.

Invalid addresses
Two tests were created, one for testing of do_bus_error_write and one for testing of do_bus_error_read. Both tests attempt to access a memory area that does not exist on the UC3A0512 microcontroller. The first test, appendix I.1.3, tests invalid reads, while the second test, appendix I.1.4, tests invalid writes.
**Result:** The tests for reads and writes triggered an “oops” from the kernel when attempted. This is because the handlers used were the same as for the implementations with MMU support. The only way to receive this error with a processor with an MMU would be if the kernel made an error, and assigned invalid memory to the application. One could argue that we should pass the error to the application, instead of handling it in the kernel, but we decided not to do this. If MPU support was added, the error should be handled similarly as with an MMU, and until then the program is terminated.

Another problem we discovered was that the `do_bus_error_read` entry point was mislabeled in the original source code we based our work on. It turned out that `do_bus_error_read` was triggered by instruction reads, not data reads. The two exception handlers call the same function, with a parameter to show whether the access was a read or write. This is used to print a log line: *Bus error at physical address 0x00100000 (write access).* The mislabeling caused both of our tests to be logged as a write access.

**Invalid opcode**

This handler is used by many exceptions. It is used when the opcode is unknown to the microcontroller, when the opcode is known but unsupported and when the application has insufficient privileges to execute the instruction. The logic when handling the different types is the same, so we decided to test only when the opcode was unknown to the microcontroller.

To test this, we attempt to use the `rsubeq` instruction. This instruction requires revision 2 or higher of the AVR32 architecture, while the UC3A0512ES only supports revision 1 of the AVR32 architecture. When an illegal opcode is found, the Linux kernel should deliver a SIGILL exception to the program.

We created two tests, to test two different cases of invalid operations. Appendix I.1.5 tests the case when the instruction is aligned on a four byte boundary. The other test, appendix I.1.5 tests the case when the instruction is aligned on a two byte boundary, but not on a four byte boundary. Both alignments are valid for all instructions, but the handler for invalid opcodes makes an assumption which did not hold for the UC3A0512 microcontroller. The handler code, which is shared between AVR32A and AVR32B assumes that it can read unaligned 4 byte words.

We had a patch which enabled reading of unaligned words (appendix D.15), but Håvard Skinnemoen commented that this patch should not be necessary. If that patch is dropped, then this code needs to be fixed. See the comments for patch 15 in section 4.5.2.

**Result:** Both tests pass with our patch applied, but if we remove our patch, the second test fails. When passing the test, the application receives a SIGILL exception from the kernel.
4.2.5 Code submission, requirement 8

Result: Almost everything was submitted. We did not submit the SPI changes, since these were incomplete, and largely irrelevant with the restructuring of the peripheral DMA code done by Atmel (see 3.6.9).

Section 4.5.2 summarizes the received feedback.

4.3 Toolchain

4.3.1 Select binary format, requirement 9

Result: FDPIC ELF was selected (see 3.5).

4.3.2 Produce binaries, requirement 10

Two simple example programs written in C (listing 4.2 and listing 4.3) were compiled with GCC.

The first program is a simple program which does not use large parts of the C library. It only invokes the write system call, to put “Hello!” on standard output.

```
#include <unistd.h>

int main(int argc, char *argv[])
{
    write(1, "Hello!\n", 7);
    return 0;
}
```

Listing 4.2: hello.c

The second program uses larger parts of the C library. It invokes printf, which uses the standard input/output part of the C library. This part will not work without correct relocations, because there are several data pointers used. For example, printf uses the FILE *stdout pointer, which only works with correct data relocation.

The 42 parameter to printf is mainly included to prevent optimization. If no arguments are given, the compiler will optimize the printf call to a puts call.

```
#include <stdio.h>

int main(int argc, char *argv[])
{
    printf("Hello world! \%d\n", 42);
    return 0;
}
```

Listing 4.3: helloworld.c

```
avr32-uclinux-uclibc-gcc -mfdpic hello.c -o hello
```

Listing 4.4: Compiling hello.c

The output should be copied to the root file system and executed. The output should be “Hello!” for the first program and “Hello world!” for the second.
4.3.3 Produce libraries, requirement 11

This is not implemented and therefore not tested.

Result: Not implemented.

4.3.4 Code submission, requirement 12

Result: Code submitted to the avr32linux.org’s mailing list as discussed in section 3.11.2, 3.11.3 and 3.11.4.

4.4 Linux user space

4.4.1 BusyBox, requirement 13

BusyBox was compiled with the applets listed in requirement 13. Each applet was then in turn invoked from the Hush shell.

Result: Passed. Hush and all the other applets listed in the requirement executed, produced the expected output and terminated successfully. However, the applets occasionally caused the system to run out of free memory. Because the tasks that these applets perform are well known, we choose not to list the output of every applet and how they were invoked. When an application causes the system to run out of memory, the kernel fails to recover, and will crash if the system runs out of memory again.

We have looked at the error which occurs when out of memory, but have been unable to determine the cause. Due to our limited time frame, we have been unable to spend very much time on this bug.

4.5 Patch submission feedback

In this section we list the feedback to the patches we submitted, with one section for each patch series. For brevity, some of the feedback may be omitted, slightly shortened, rephrased or summarized.

4.5.1 U-Boot

The most relevant feedback to the submitted U-Boot patch series is presented in this section. All the e-mails discussing these patches can be found in the official U-Boot mailing list archive².

²http://lists.denx.de/pipermail/u-boot/2009-January/thread.html#45925
The IP alignment patch was applied to the network branch by Ben Warren.

Applied to evk1100-prep and merged into next. I’ll send it upstream as soon as it’s fine with Wolfgang.

Btw, I had to rebuild my next branch since it’s become a bit stale. Since all the non-merge commit IDs are the same, I hope it won’t cause any problems — please let me know if you see any weird merge issues.

Reply from Håvard Skinnemoen

Applied to evk1100-prep, thanks.

Reply from Håvard Skinnemoen

Applied to evk1100-prep, thanks.

Reply from Håvard Skinnemoen

A white space unfortunately found it’s way into our patch.

```
- adv = ADVERTISE_CSMA | ADVERTISE_ALL;
+ adv = ADVERTISE_CSMA | ADVERTISE_ALL ;
```

Reply from Ben Warren

This patch sparked a debate on the mailing list. The most relevant replies are listed below. The rest of the e-mails in this discussion can be found on the web page listed at the beginning of this section.

```c
> +#ifdef CONFIG_MACB_FORCE10M
> + printf("%s: 100Mbps is not supported on this board - forcing 10Mbps.n",
```
> + netdev->name);
> + 
> + adv &= ~ADVERTISE_100FULL;
> + adv &= ~ADVERTISE_100HALF;
> + adv &= ~ADVERTISE_100BASE4;
> +#endif

not a fan

could you be more specific about the problem?

Reply from Ben Warren

On the EVK1100 board, the CPU (UC3A0512) is connected to the PHY via an RMII bus. This requires the CPU clock to be at least 50 MHz. Unfortunately, the chip on current EVK1100 boards may be unable to run at more than 50 MHz, and with the oscillator on the board, the closest frequency we can generate is 48 MHz.

This patch makes it possible to limit the macb to 10 MBit for this case. We are open for suggestions for other solutions.

Our reply

How about using a PHY capability override CONFIG. Something like this:

```c
#if defined(CONFIG_MACB_PHY_CAPAB)  // insert better name here
adv = ADVERTISE_CSMA | CONFIG_MACB_PHY_CAPAB;
#else
adv = ADVERTISE_CSMA | ADVERTISE_ALL
#endif
```

Just an idea...

Reply from Ben Warren

[PATCH v2 6/9] AVR32: macb - Search for PHY id

This patch was added to the network branch of U-Boot by Ben Warren. We asked Ben if this code should be omitted in any newer versions of this patch series. Ben’s answer is quoted below.

Correct. You’ve done a good job of making them orthogonal to the rest of your code, so the net repo is where they belong. I’ll issue a pull request after doing a bit of testing.

Reply from Ben Warren
[PATCH v2 7/9] AVR32: Must add NOPs after disabling interrupts for AT32UC3A0512ES

Applied to evk1100-prep, thanks.

Reply from Håvard Skinnemoen

[PATCH v2 8/9] AVR32: CPU support for AT32UC3A0xxx CPUs

Regarding the reset procedure in the patch:

I read this as if you just reset the CPU "internal" stuff. Sorry for asking stupid questions, I don’t know this architecture at all, but: Will external chips be reset this way, too? Or how do you make sure that external peripherals get properly reset?

Reply from Wolfgang Denk

As most of the needed functionality is embedded in the microcontroller, there are very few external peripherals used by U-Boot. Apart from external memory, and oscillator, and level-shifters for the serial-port, there is only the Ethernet PHY, and that one shouldn’t need a reset.

Our reply

Famous last words. What if exactly the PHY is stuck and needs a reset?

Hmm... 'apart from external memory' ... does external memory also include NORFlash? Eventually the NOR flash you are booting from? Assume the NOR flash is in query mode when you reset the board — how does it get reset, then?

Reply from Wolfgang Denk

The only reset we can do on the PHY is a software reset, by sending a reset command over the (R)MII bus, and I don’t believe that the generic chip code is the place to do that. If it should be done, I believe it should be done by the macb-driver after the reset. This would allow it to recover even if the microcontroller wasn’t reset by the reset-command, but for example by a watchdog timer.

External memory in this case would be SRAM or SDRAM.

Our reply
On other chips, it also covers the NOR flash you’re booting from. So I suppose we should look into this...maybe we need some sort of "notifier chain" thing to give other drivers a chance to reset their peripherals...

Reply from Håvard Skinnemoen

Comment regarding use of “magic hardcoded constants”:

It would be nice if you used readable names instead of all these magic hardcoded constants.

Comment from Wolfgang Denk

Denk also pointed out that we should use of structs instead of offsets. We used lists with offsets instead of C-structs because that is how it was done in the existing code we used as a basis for our work, and rewriting this code was not prioritized. The above comment about unreadable constants triggered a discussion about how this code should be written. Several e-mails are omitted here. For the full story, see the mailing list archive available at the URL listed at the beginning of this section. The discussion ended with the following comments from us and Skinnemoen.

But in this case, this is code which should never be changed without looking at the datasheet, and probably schematics for the board in question.

Our comment

Exactly. At some point, you need code which encapsulates the definitions in the data sheet, and that’s the whole purpose of these functions.

Comment from Håvard Skinnemoen

[PATCH v2 9/9] AVR32: Board support for ATEVK1100

Wolfgang Denk commented on the presence of some code that was commented out in this patch. He found the code useless and wanted us to remove it. The code in question was the optimized memory timings that we never managed to fully optimize (described in section 3.8), but left in the code.

He also commented that we had set a configuration option that deactivates certain scripting functionality in U-Boot when it is compiled for the EVK1100 board. That option came from the code for the NGW100 board that we had used as a basis for our own configuration. On Denk’s request, we chose to remove both the pieces of code mentioned above.
4.5.2 Linux

As mentioned in section 3.11.7, we submitted patches to the avr32linux.org’s kernel mailing list. We received some feedback on these patches, mostly from Håvard Skinnemoen. In this section, we summarize the feedback we got on the submitted patches. Comments are included where appropriate.

The patches can be found in appendix D.

General approval

Håvard Skinnemoen gave comments like “Looks reasonable’ to the following patches:

- [PATCH 01/29] macb: limit to 10 Mbit/s if the clock is too slow to handle 100 Mbit/s
- [PATCH 04/29] AVR32: use task_pt_regs in copy_thread.
- [PATCH 05/29] AVR32: FDPIC ELF support.
- [PATCH 06/29] AVR32: Introduce AVR32_CACHE and AVR32_UNALIGNED Kconfig options
- [PATCH 07/29] AVR32: mm/tlb.c should only be enabled with CONFIG_MMU.
- [PATCH 08/29] AVR32: mm/fault for !CONFIG_MMU.
- [PATCH 10/29] AVR32: MMU dummy functions for chips without MMU.
- [PATCH 11/29] AVR32: mm_context_t for !CONFIG_MMU
- [PATCH 13/29] AVR32: copy_user for chips that cannot do unaligned memory access.
- [PATCH 14/29] AVR32: csum_partial: Support chips that cannot do unaligned memory access.
- [PATCH 16/29] AVR32: memcpy implementation for chips that cannot do unaligned memory accesses.

Håvard wanted signoffs for patches that can be sent upstream.

[PATCH 02/29] AVR32: Don’t clear registers when starting a new thread

From the our patch description:

Not certain about this patch, but we can’t clear the registers here, since the FDPIC ELF loader stores a pointer to the process’ load map in a register before this function is called.

Our patch description
Right.

Do you know how other architectures do this?
I'm a bit concerned about leaking information from one process to another if we don’t zero out the registers...

Håvard's comment

As far as we understand does neither x86, frv, SuperH 32, blackfin and several other architectures do it in start_thread. A quick survey shows that ARM and PowerPC are the only architectures who clear the registers in start_thread.

X86 and several other architectures clears the registers from an architecture dependent hook in the elf loader, ELF_PLAT_INIT, which is called right before start_thread.

Our reply

Patch [PATCH 03/29] AVR32: split paging_init into mmu init, free memory init and exceptions init

You still export the zero page when !CONFIG_MMU, but you only initialize it when CONFIG_MMU is set. Is that a good idea?

Håvard's comment

When looking at this again, it turns out that the zero-page wasn’t used strictly for MMU-systems. We had assumed that it was only used when the kernel needed to map a page full of zeros somewhere.

It turns out that it is also used by fs/direct-io.c:760. Therefore, the zero page still needs to exist for MMU-less systems.

Our comment

But then it really should be initialized, no?

Håvard's reply
Yes, that was what we meant.

Our reply

[PATCH 09/29] AVR32: ioremap and iounmap for !CONFIG_MMU

Would probably be more efficient to do this inline. But I can’t see any serious problems with this code, so it’s fine with me.

Comment from Håvard

[PATCH 12/29] AVR32: Add cache-function stubs for chips without cache

Would be better to do this inline, I think. But let’s worry about optimization later.

Comment from Håvard

Regarding a copy_to_user_page stub:

Hum...don’t you need to do any copying at all here?

Comment from Håvard

Oops, seems we became a little carried away here, and missed the memcpy.

Regarding making these inline – most of them could be changed, and we would agree that this would make the code simpler.

Btw.: There are a lot of static inline functions in include/asm/cacheflush.h that are only called from mm/cache.c

Our comment

[PATCH 15/29] AVR32: avoid unaligned access in uaccess.h

The patch fixes __get_user_check by calling copy_from_user if the pointer is unaligned. Note that there are three more macros that needs to be changed: get_user_nocheck, put_user_check and put_user_nocheck.

This patch really needs a better solution that doesn’t involve calling copy_from_user or copy_to_user.

Patch description
I'm sort of wondering if this is really needed. AP7000 doesn't support unaligned 16-bit access, and we don't do anything to avoid that. And the worst thing that can happen is that some system calls may return -EFAULT if user space passes a badly aligned pointer.

Håvard's comment

This patch was added because we hit an exception during the illegal opcode handler. That function executes the following code:

```c
pc = (void __user *)instruction_pointer(regs);
if (get_user(insn, (u32 __user *)pc))
    goto invalid_area;
```

If get_user isn't changed then this function should be changed.

Also: unaligned accesses in kernel mode doesn't cause an -EFAULT, but instead an Oops. If the kernel is going to cause unaligned exceptions, I assume that this should be changed.

Our comment

Right... I guess that function should be changed. But it _should_ be able to handle it gracefully in any case...

Håvard's reply - about the illegal opcode handler

Ah...that doesn't sound good. Looks like do_address_exception() doesn't walk the fixup tables before crashing...that should probably be fixed.

Could you give the (untested) patch below a try?

```diff
diff --git a/arch/avr32/kernel/traps.c b/arch/avr32/kernel/traps.c
index d547c8d..69e9218 100644
--- a/arch/avr32/kernel/traps.c
+++ b/arch/avr32/kernel/traps.c
@@ -75,8 +75,15 @@ void _exception(long signr, struct pt_regs *regs, int code,
               {  
                 siginfo_t info;
                 
```
```c
+ if (fixup) {
+   regs->pc = fixup->fixup;
+   return;
+ }
+   die("Unhandled exception in kernel mode", regs, signr);
+
memset(&info, 0, sizeof(info));
info.si_signo = signr;
```

Håvard’s reply - about unaligned access not causing -EFAULT

Yes, it solves the problem.
Btw; we had to declare the fixup variable also.

Our reply

Ah yes... I did actually fix that, but I forgot to regenerate the diff.
The result should look something like the below.

Håvard’s reply

[PATCH 17/29] AVR32: Mark AVR32B specific assumptions with CONFIG_SUBARCH_AVR32B in strnlen

Please include a short description about which assumptions you’re talking about and why they’re specific to AVR32B.

Comment from Håvard

The problem is that this code assumes that the address space is split into two 2GB parts, with the lower half belonging to user space. This assumption does not hold for AVR32A, where almost all memory is located in the upper half of the address space, and there is no clear separation between kernel space and user space memory areas.

Our comment

[PATCH 18/29] AVR32: mm/dma-coherent.c - ifdef AVR32B specific code

Actually, the whole thing should be a no-op on devices with no cache, since there’s no need to synchronize anything.

Comment from Håvard
That is true, but in this case we focused on the code that was AVR32B specific. One could conceivably have an AVR32A microcontroller with caches?

I assume that if the cache changes above were moved to inline functions in a header file, this function would compile down to a no-op.

Several patches got comments like:

I think this should depend on CONFIG_MMU, not AVR32B.

Comment from Håvard Skinnemoen

This applies for the patches listed below.

- [PATCH 18/29] AVR32: mm/dma-coherent.c - ifdef AVR32B specific code.
- [PATCH 19/29] AVR32: Disable ret_if_privileged macro for !CONFIG_SUBARCH_AVR32B.
- [PATCH 21/29] AVR32: AVR32A address space support.
- [PATCH 22/29] AVR32: Change maximum task size for AVR32A
- [PATCH 23/29] AVR32: Fix uaccess __range_ok macro for AVR32A.

The main criteria we did for deciding whether something should depend on AVR32B or if it should depend on MMU, was whether it depends on AVR32B memory layout, or whether it depends on an MMU being present.

But the virtual memory layout does not depend on the sub-architecture (apart from the entry point, which makes the two somewhat related), it depends on whether or not the chip has an MMU.

If the chip does not have an MMU, all virtual addresses are mapped 1:1 to physical addresses. If the mapping isn’t 1:1, there must be something in the chip doing the mapping, i.e. an MMU. This is confirmed by the fact that the segmented memory model is defined in the MMU chapter in the architecture manual.

So there’s really no such thing as an AVR32B memory layout — the
memory layout depends entirely on whether or not an MMU is present.

As for caches, I think adding caches without also adding an MMU would be problematic since the caching properties of a given address is determined by the MMU. So if you don’t have an MMU, you won’t be able to bypass the cache for certain parts of the memory, which makes it difficult to do DMA.

Sure, it might be possible to introduce some other mechanism for specifying caching properties, but the current architecture document does not specify any such mechanism apart from the MMU.

Håvard’s reply

After Håvard’s comment, we realized that some of our decisions on which configuration flags we used was based on a misunderstanding. Earlier we had the misconception that AVR32A implied that an MMU was not present, and AVR32B meant that an MMU was present.

The changes we did were still correct, but the build criteria were wrong in many places. As mentioned in Håvard’s comments, some code segments should be updated to depend on the `CONFIG_MMU` option and not the sub-architecture as our patches do.

### [PATCH 20/29] AVR32: AVR32A support in Kconfig

Ok. I was thinking this could have been merged with some of the other AVR32A patches, but then again, this makes it easier to reorder the patches, so it’s fine.

Comment from Håvard Skinnemoen

When the implementation were split into patches, we tried to keep this in mind and rather split into too many rather than too few. By doing this, it should hopefully be easier for other developers to pick up our work and continue development.

### [PATCH 21/29] AVR32: AVR32A address space support

Haven’t had a chance to have a good look over but:

On Fri, 2009-05-15 at 14:39 +0200, Gunnar Rangoy wrote:
> +#elif CONFIG_SUBARCH_AVR32B
#elif defined (CONFIG_SUBARCH_AVR32B)
??

Comment from Ben Nizette

Oops, it should indeed use `defined(...)`. It will still work as long as the only sub-architectures are AVR32A and AVR32B, which is why we missed it.
Our comment

[PATCH 24/29] AVR32: Support for AVR32A (entry-avr32a.c)

Ok, this part really does depend on AVR32A, so that part is fine. Unfortunately, I haven’t got the time to review this or the remaining patches today, so I’ll have to continue some other day (probably next week).

Håvard Skinnemoen

4.5.3 Toolchain

Binutils support for FDPIC ELF on AVR32 UC3

This patch adds support for statically linked FDPIC ELF targets on AVR32. It mostly works, but there is a lack of error checking on input file types, which means that if the linker is invoked incorrectly, it will fail in strange ways.

For example, if one fails to specify -l elf32-avr32fdpic to strip/objcopy, it will pretend that the file is a normal elf32-avr32 file, and 'ruin' the PT_GNU_STACK program header.

Some functions are (almost) direct copies from elf32-bfin.c and elf32-frv.c, which are two architectures with FDPIC support. The code for creating the .rofixup-section is however mostly new.

Patch description

Without this error checking, it will be difficult to accept the patch as-is. We can’t in good faith expect our users accept that the linker will fail ‘in strange ways’ because of an incorrect invocation. It needs to fail gracefully and in a known way.

Comment from Eric Weddington (Atmel)

uClibc: Some support for FDPIC ELF for AVR32

This patch enables uClibc to be linked statically into a FDPIC ELF binary on AVR32. It doesn’t update the parts necessary for dynamic linking.

There are also a few simple changes to memcmp, memcpy and memmove,
which makes them work on the UC3 (which cannot access unaligned memory.)

Patch description

If the change to the mem* functions have nothing to do with support for FDPIC, then it is preferrable if the patches are separated. The idea is that a patch file should have a single purpose only and not to mix together changes with different purposes.

Comment from Eric Weddington (Atmel)

I will commit the uClibc stuff, and it will be broken down into separate changes. Patches will go through review on the uClibc list + Paul before committing.

Hans-Christian Egtvedt (Atmel)
Chapter 5

Conclusion

During this project we have created a modified version of Linux, capable of running on a microcontroller of the UC3A family. A toolchain has been extended with the capability of generating executables suitable for this platform. Our version of Linux is capable of loading and running these executables. Previously submitted patches for the U-Boot loader have been improved, significantly revised, and re-submitted.

Because custom hardware had to be used, the usefulness of the product of our work is currently somewhat limited. However, with newer chips without the SDRAM bug, it should be possible to run Linux on the EVK1100 without hardware modifications. With some software modifications, it should also be possible to use our work with other closely related microcontrollers in the AVR32 family.

Patches for the majority of all software modifications done in this project have been submitted to the appropriate maintainers. By publishing our work, we have significantly contributed to increase the useful assortment of software for the UC3A microcontroller family. If Atmel wants to, they can adopt the patches and finalize them to make sure that they ultimately become part of their respective official software distributions.

By undertaking this project, we have gained valuable knowledge about embedded development, the Linux kernel, the GNU Toolchain, and open source development in general. It has also been a valuable experience to communicate with other people in the open source communities.
Chapter 6

Future work

This chapter describes the tasks that currently remain undone. The three first sections in this chapter describes remaining work in U-Boot, Linux and the toolchain. The last section discusses the possibility of running programs created for the AVR32A on AVR32B chips and vice versa.

6.1 U-Boot

In many setups it would be more suitable to load the kernel from SD card or flash, because then it would not need to depend on external systems when booting.

There are a few changes that still is not submitted to the U-Boot mailing list. This mostly regards the MACB driver but it also includes some cleanup of unnecessary code. These could be included in an updated patch series, but this was not highly prioritized.

6.2 Linux

Ideally, Linux should support all the hardware in the UC3A controller and on the EVK1100. In this section we outline the most essential features that we would have tried to implement if we had the time and hardware available.

6.2.1 PDCA support

As described in section 3.6.9, support for the PDCA was never completed. Support for the PDCA would be very useful since it is a great feature for communicating with peripherals. The restructured code for the PDC should be obtained from Atmel and adapted for the UC3A0512.
6.2.2 SPI support

The proper way to use the SPI is in combination with the PDCA. Since the PDCA support never was completed, neither was the SPI support. On the EVK1100, the microcontroller is connected to several SPI devices. Therefore, Linux support for the SPI controller would be very useful.

6.2.3 MPU support

Linux does not currently support any use of the MPU, and no attempt has been made to implement this. Without the MPU enabled, any process can read and write to any memory location, and potentially obtain all information about, sabotage or modify the kernel or any processes. The MPU is dysfunctional in our chip, and it would be very hard to implement and test the software for it.

6.2.4 Support for on-chip devices

Linux support would also be desirable for the following on-chip features of the UC3A:

- USB interface
- Audio Bitstream DAC
- Synchronous Serial Controller
- Analog-to-Digital Converter

None of these features have been considered in the implementation phase of this project.

6.2.5 Memory copy optimization

The implementation of memory copying routines in Linux is not optimized for any unaligned or halfword copying. A good way to achieve this functionality would be to extract the already existing and optimized copying routines in newlib. Newlib is a standard C library implementation for embedded systems and does not require any operating system like uClibc does. For more information about newlib, see the newlib website\(^1\).

6.2.6 Debug support

Support for debugging applications with a software debugger under Linux is not completed. We have made changes to the entry point, so debugging events should be handled. However, there are some code in `arch/avr32/kernel/ptrace.c` that is not changed for the AVR32A architecture.

The relevant piece of code is:

\(^1\)http://sourceware.org/newlib/
This code sets up something called a “debug trampoline”, to handle the case where a user space program single steps into an exception. In such cases, the exception should be executed at full speed and single stepping should resume after the exception. The code above changes the return address of the exception, so that that returns to a debug “trampoline” instead of the real return address. This trampoline will then reconfigure the debug system, so that single stepping can be resumed.

The problem with the code is that it changes two system registers that are unavailable on the AVR32A architecture. Instead of saving the return address and status register in dedicated system registers, the AVR32A architecture saves them on the stack. The equivalent of changing the two system registers would be to change the two registers as they are saved on the stack.

We made a design decision to try to reuse the registers saved automatically by the processor for the `pt_regs` structure. Thus, if we change the return address and status register on the stack, we will also change the return address and status register the exception handler would see. This would be troublesome, since the status register and return address is used to determine how the exception is handled.

A possible work around would be to insert a new stack frame when single stepping into an exception. The first stack frame would contain the correct `pt_regs` structure, while the next stack frame would contain what is needed to return to the debug “trampoline”.

### 6.2.7 FDPIC ELF support for systems with an MMU

As mentioned in 3.6.16, we only added support for FDPIC ELF for systems with an MMU. It would be useful to also support the FDPIC ELF format in systems with MMU support. This would allow development of FDPIC ELF applications and the FDPIC ELF toolchain on platforms with an MMU.

To do this, one needs to create a structure of the `mm_context_t` used by MMU systems, and add the fields required by the FDPIC ELF loader to this structure.

```c
1  ti->rar_saved = sysreg_read(RAR_EX);
2  ti->rsr_saved = sysreg_read(RSR_EX);
3  sysreg_write(RAR_EX, trampoline_addr);
4  sysreg_write(RSR_EX, (MODE_EXCEPTION | SR_EM | SR_GM));
```
6.3 Toolchain

The toolchain is not yet completed – it lacks support for dynamic linking, and it needs some error checking. In this section we will outline the remaining tasks for the toolchain.

6.3.1 Dynamic linking

The toolchain is currently only able to produce statically linked binaries. It should be able to support creating shared libraries and dynamically linked executables. At the very least, this requires changes to the linker, but it might be advantageous to also change the assembler and GCC.

The linker needs to be changed to handle a new relocation type for function calls. As mentioned in section 2.10.3, function calls across two different modules need to change the current GOT pointer to the new modules GOT pointer. The previous GOT pointer needs to be restored when the function call returns. The linker must therefore support this type of function call.

When saving the previous GOT pointer, it can also be advantageous to save it to one of the registers which is preserved across function calls. That would require changes to GCC and to the assembler. The assembler would need a new pseudo-instruction for function calls. It currently has a pseudo-instruction for function calls, which the linker replaces with the correct method for calling the function. The new pseudo-instruction should take in both the destination of the function call and a register which can be used to hold the previous GOT pointer.

After the new pseudo-instruction is added to the assembler, GCC would need to be changed to use it. GCC would need to select a suitable register and insert the new call instruction with that register as a parameter.

6.3.2 Error handling

The current changes to the linker doesn’t properly check that all input files are FDPIC ELF files when being executed with the \texttt{-mavr32linuxfdpic} flag. It is this flag that tells the linker that it is processing FDPIC ELF files. This leads to errors, since it will not initialize everything correctly in those cases. What needs to be done is to check that every input file is FDPIC ELF files when the linker is executed with the \texttt{-mavr32linuxfdpic} flag. If it isn’t executed with the \texttt{-mavr32linuxfdpic} flag, it should check that none of the input files are FDPIC ELF files.

6.4 AVR32B series compatibility

Since AVR32A and AVR32B are both implementations of the AVR32 architecture, it should, in theory, be possible to run the same binary Linux applications on both sub-architectures. A couple of requirements have to be fulfilled, though. The binary must only use instructions available in both sub-architectures, and can not use unaligned memory accesses. Note that different revisions of the AVR32 architecture exist, and the
instruction sets differ slightly. In the future, if support for dynamically linked libraries is implemented for both sub-architectures, unaligned access could be outsourced to the C library, thus eliminating the alignment issue in user space applications. To be able to work on all AVR32 variations, the binary must be compiled as a FDPIC ELF file, and FDPIC ELF files must be supported also on AVR32 systems with an MMU. See also section 6.2.7.
Chapter 7

Bibliography


Appendix A

Acronyms

**ABI**  Application Binary Interface

**BFD**  Binary Format Descriptor

**CPU**  Central Processor Unit

**CVS**  Concurrent Versions System

**DAC**  Digital to Analog Converter

**DHCP**  Dynamic Host Configuration Protocol

**DMA**  Direct Memory Access

**EBI**  External Bus Interface

**EEPROM**  Electrically Erasable Programmable Read-Only Memory

**ELF**  Executable and Linkable Format

**EPROM**  Erasable Programmable Read-Only Memory

**EEPROM**  Electrically Erasable Programmable Read-Only Memory

**EPROM**  Erasable Programmable Read-Only Memory

**FDPIC**  Function Descriptor Position Independent Code

**FSF**  Free Software Foundation

**GCC**  GNU Compiler Collection

**GDB**  GNU Debugger

**GNU**  GNU’s Not Unix

**GPIO**  General Purpose Input/Output
GPL General Public License
GOT Global Offset Table
HSB High Speed Bus
IO Input/Output
IP Internet Protocol
IDE Integrated Drive Electronics
JTAG Joint Test Action Group
LCD Liquid Crystal Display
LED Light Emitting Diode
MAC Media Access Controller
MCI MultiMedia Card Interface
MII Media Independent Interface
MMU Memory Management Unit
MPU Memory Protection Unit
NFS Network File System
NOP No-Operation
PDC Peripheral DMA Controller
PDCA Peripheral DMA Controller
PIC Position Independent Code
PIO Parallel Input/Output
PLC Programmable Logic Controller
PLT Procedure Linkage Table
POSIX Portable Operating System Interface for Unix
PROM Programmable Read-Only Memory
RAM Random Access Memory
RMII Reduced Media Independent Interface
SD Secure Digital
SDRAM  Synchronous Dynamic Random Access Memory
SMC  Static Memory Controller
SPI  Serial Peripheral Interface
SRAM  Static Random Access Memory
SUS  Single UNIX Specification
TCP  Transmission Control Protocol
TFTP  Trivial File Transfer Protocol
TLB  Translation Lookaside Buffer
UDP  User Datagram Protocol
URL  Uniform Resource Locator
USB  Universal Serial Bus
Appendix B

U-Boot patch cleanup

B.1 Network limiting reorganization

diff --git a/drivers/net/macb.c b/drivers/net/macb.c
index 561669b..31a4f1e 100644
--- a/drivers/net/macb.c
+++ b/drivers/net/macb.c
@@ -296,28 +296,17 @@ static void macb_phy_reset(struct macb_device *nach)
     struct eth_device *netdev = &macb->netdev;
     int i;
     u16 status, adv;
-   int rmii_mode;
-   unsigned min_hz;
-   #ifdef CONFIG_RMII
-      rmii_mode = 1;
-      min_hz = 50000000 ;
-   #else
-      rmii_mode = 0;
-      min_hz = 25000000 ;
-   #endif
-   adv = ADVERTISE_CSMA | ADVERTISE_ALL ;
-   if (get_hsb_clk_rate() < min_hz) {
-      printf("%s: HSB clock < %u MHz in %s mode - ",
-             netdev->name , min_hz / 1000000 ,
-             (rmii_mode ? "RMII " : "MII ");
+   adv &= ~ADVERTISE_100FULL ;
+   adv &= ~ADVERTISE_100HALF ;
+   adv &= ~ADVERTISE_100BASE4 ;
+   #endif
-     adv &= ~ADVERTISE_100FULL;  
-     adv &= ~ADVERTISE_100HALF;  
-     adv &= ~ADVERTISE_100BASE4;  
-   }
+     adv &= ~ADVERTISE_100FULL;
+     adv &= ~ADVERTISE_100HALF;
+     adv &= ~ADVERTISE_100BASE4;
+   }
+   macb_mdio_write(macb , MII_ADVERTISE , adv);
+   printf("%s: Starting autonegotiation...\n", netdev->name);
00 -345.7 +334.7 00 static int macb_phy_find(struct macb_device *macb)
+/* Search for PHY... */
+for (i = 0; i < 32; i++) {
   macb->phy_addr = i;
   phy_id = macb_mdio_read(macb , MII_PHYSID1);
   if (phy_id != 0xff) {
      printf("%s: PHY present at %d\n", macb->netdev.name , i);
Appendix B. U-Boot Patch Cleanup

B.2 Add board to lists

```diff
diff --git a/MAINTAINERS b/MAINTAINERS
index 9c0d6bf..d83b580 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -747,6 +747,7 @@ Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
 ATSTK1004 AT32AP7002
 ATSTK1006 AT32AP7000
 AT89X100 AT32AP7004
+ATEVK1100 AT32UC3A0512

#########################################################################
# SuperH Systems:
```

B.3 Precedence safety fix

```diff
diff --git a/cpu/at32uc/smc.h b/cpu/at32uc/smc.h
index ea4d399..ae765ec 100644
--- a/cpu/at32uc/smc.h
+++ b/cpu/at32uc/smc.h
@@ -8,10 +8,10 @@
 # include <asm/io.h>
 /* SMC register offsets */
#define SMC_SETUP(x) 0x0000+(x)*0x10
+define SMC_SETUP(x) (0x0000+(x)*0x10)
#define SMC_PULSE(x) 0x0004+(x)*0x10
+define SMC_PULSE(x) (0x0004+(x)*0x10)
#define SMC_CYCLE(x) 0x0008+(x)*0x10
+define SMC_CYCLE(x) (0x0008+(x)*0x10)
#define SMC_MODE(x) 0x000c+(x)*0x10
+define SMC_MODE(x) (0x000c+(x)*0x10)

/* Bitfields in SETUP0..3 */
#define SMC_NWE_SETUP_OFFSET 0
```

B.4 Board configuration

```diff
diff --git a/include/configs/atevk1100.h b/include/configs/atevk1100.h
index 2a9d91b..ad134f8 100644
--- a/include/configs/atevk1100.h
+++ b/include/configs/atevk1100.h
@@ -72,7 +72,7 @@
 * PLLOPT[2]: Disable the Wide-Bandwidth mode (Wide-Bandwidth mode allows a faster startup time and out-of-lock time).
 * PLLOPT[3]: Delay the Wide-Bandwidth mode (Wide-Bandwidth mode allows a faster startup time and out-of-lock time).
 *
+* We want to run the CPU at 66 MHz, and the fVCO of the PLL at 132 MHz. */
@@ -93,7 +93,7 @@
 #define CONFIG_BOOTARGS
```

B.5. KEEPING LISTS SORTED

22 "console=ttys0 ip= dhcp root=/ dev / nfs rootwait=1"
23
24 # @ -143,6 +144,11 @
25 /* Ethernet - RMII mode */
26 define CONFIG_MACB 1
27 define CONFIG_RMI1 1
28 /*
29 * 100Mbps requires a CPU clock of at least 50MHz for RMII mode, and 25MHz for
30 * MII mode. Set CONFIG_MACB_FORCE10M flag if clock is too slow for 100Mbit.
31 */
32 #define CONFIG_MACB_FORCE10M 1
33
34 define CONFIG_ATMEL_USART 1
35 define CONFIG_ATMEL_SPI 1
36 @@ -156,7 +162,7 @@
37 # define CONFIG_NR_DRAM_BANKS 1
38 /* Internal flash on the microcontroller (TODO?) (512kB)*/
39 # define CFG_FLASH_BASE 0 x80000000
40 # define CFG_FLASH_SIZE 0x80000
41 # define CFG_MAX_FLASH_BANKS 1
42 @@ -171,14 +177,15 @@
43 # define CONFIG_ENV_IS_IN_FLASH 1
44 # define CONFIG_ENV_SIZE 65536
45 # define CONFIG_ENV_ADDR (CFG_FLASH_BASE + CFG_FLASH_SIZE - CONFIG_ENV_SIZE)
46 # define CFG_INIT_SP_ADDR (CFG_INTRAM_BASE + CFG_INTRAM_SIZE)
47 # define CFG_MALLOC_LEN (256*1024)
48 # define CFG_DMA_ALLOC_LEN (16384)
49 /* Allow 3MB(TODO: update) for the kernel run-time image */
50 # define CFG_LOAD_ADDR (CFG_SDRAM_BASE + 0 x00270000)
51 # define CFG_BOOTPARAMS_LEN (16 * 1024)
52
53 #define CONFIG_ENV_IS_IN_FLASH 1
54 #define CONFIG_ENV_SIZE 65536
55 #define CONFIG_ENV_ADDR (CFG_FLASH_BASE + CFG_FLASH_SIZE - CONFIG_ENV_SIZE)
56 #define CFG_INIT_SP_ADDR (CFG_INTRAM_BASE + CFG_INTRAM_SIZE)
57 #define CFG_MALLOC_LEN (256*1024)
58 #define CFG_DMA_ALLOC_LEN (16384)
59 /* Allow 3MB(TODO: update) for the kernel run-time image */
60 # define CFG_LOAD_ADDR (CFG_SDRAM_BASE + 0 x00270000)
61 # define CFG_BOOTPARAMS_LEN (16 * 1024)

B.5 Keeping lists sorted

```bash
diff --git a/Makefile b/Makefile
index bfaa625..d9fbc6e 100644
--- a/Makefile
+++ b/Makefile
@@ -3047,6 +3047,9 @@ $(BFIN_BOARDS):
  # AVR32
  #******************************************************************************
  # SH3 (SuperH)
  #******************************************************************************

B.6 Removal of TODOs

```bash
diff --git a/cpu/at32uc/at32uc3a0xxx/sm.h b/cpu/at32uc/at32uc3a0xxx/sm.h
index d232f91..17bff39 100644
--- a/cpu/at32uc/at32uc3a0xxx/sm.h
+++ b/cpu/at32uc/at32uc3a0xxx/sm.h
```
`#define SM_PM_VREGCR (SM_PM_REGS_OFFSET + 0x00c8)`

`#define SM_PM_BOD (SM_PM_REGS_OFFSET + 0x00d0)`

`#define SM_PM_RCAUSE (SM_PM_REGS_OFFSET + 0x140)`

`-#define SM_RC_RCAUSE (SM_PM_REGS_OFFSET + 0x00c8) /* TODO : remove */`

`/* RTC starts at 0xFFFF0D00 */`

`#define SM_RTC_REGS_OFFSET 0x0d00`

`#define SM_RTC_CTRL (SM_RTC_REGS_OFFSET + 0x0000)`

`/* WDT starts at offset 0xFFFF0D30 */`

`#define SM_WDT_REGS_OFFSET 0x0d30`

`#define SM_WDT_CTRL (SM_WDT_REGS_OFFSET + 0x0000)`

`/* EIC starts at offset 0xFFFF0D80 */`

`#define SM_EIC_REGS_OFFSET 0x0d80`

`#define SM_EIC_IER (SM_EIC_REGS_OFFSET + 0x0000)`

`#define SM_EIC_IDR (SM_EIC_REGS_OFFSET + 0x0004)`

`#define SM_EIC_IMR (SM_EIC_REGS_OFFSET + 0x0008)`

`#define SM_EIC_ICR (SM_EIC_REGS_OFFSET + 0x0010)`

`#define SM_EIC_MODE (SM_EIC_REGS_OFFSET + 0x0014)`

`#define SM_EIC_EDGE (SM_EIC_REGS_OFFSET + 0x0018)`

`#define SM_EIC_LEVEL (SM_EIC_REGS_OFFSET + 0x001c)`

`#define SM_EIC_FILTER (SM_EIC_REGS_OFFSET + 0x0020)`

`#define SM_EIC_TEST (SM_EIC_REGS_OFFSET + 0x0024)`

`#define SM_EIC_TO (SM_EIC_REGS_OFFSET + 0x0028)`

`#define SM_EIC_CTRL (SM_EIC_REGS_OFFSET + 0x0030)`

`/* Bitfields used in many registers */`

`#define SM_EN_OFFSET 0`

`#define SM_PLLMUX_SIZE 4`

`#define SM_PLLCOUNT_OFFSET 24`

`#define SM_PLLCOUNT_SIZE 6`

`-#define SM_PLLTEST_OFFSET 31 /* TODO : remove */`

`-#define SM_PLLTEST_SIZE 1 /* TODO : remove */`

`/* Bitfields in PM_OSCCTRL0,1 */`

`#define SM_MODE_OFFSET 0`

`#define SM_STARTUP_OFFSET 8`

`#define SM_STARTUP_SIZE 3`

`/* Bitfields in PM_VOLT */`

`#define SM_VOLT_OFFSET 0 /* TODO : remove */`

`#define SM_VOLT_SIZE 1 /* TODO : remove */`

`#define SM_PM_VOLT_VAL_OFFSET 8 /* TODO : remove */`

`#define SM_PM_VOLT_VAL_SIZE 7 /* TODO : remove */`

`/* Bitfields in PM_VMREF */`

`#define SM_REFSEL_OFFSET 0 /* TODO : remove */`

`#define SM_REFSEL_SIZE 4 /* TODO : remove */`

`#define SM_PM_VMREF_VAL_OFFSET 8 /* TODO : remove */`

`#define SM_PM_VMREF_VAL_SIZE 7 /* TODO : remove */`

`/* Bitfields in PM_WAKE */`

`#define SM_WAKE_OFFSET 0 /* TODO : remove */`

`#define SM_WAKE_SIZE 1 /* TODO : remove */`
B.7. CODING STYLE FIXES

```c
#define SM_VOK_OFFSET 3 /* TODO: remove */
#define SM_VOK_SIZE 1 /* TODO: remove */
#define SM_VMRYD_OFFSET 4 /* TODO: remove */
#define SM_VMRYD_SIZE 1 /* TODO: remove */
#define SM_CKRDY_OFFSET 5
#define SM_CKRDY_SIZE 1
#define SM_MSKRDY_OFFSET 6
#define SM_WDT_SIZE 1
#define SM_JTAG_OFFSET 4
#define SM_JTAG_SIZE 1
#define SM_SERP_OFFSET 5 /* TODO: remove */
#define SM_SERP_SIZE 1 /* TODO: remove */
#define SM_CPUERR_OFFSET 7
#define SM_CPUERR_SIZE 1
#define SM_OCDRST_OFFSET 8
```

B.7 Coding style fixes

```c
diff --git a/cpu/at32uc/at32uc3a0xxx/clk.c b/cpu/at32uc/at32uc3a0xxx/clk.c
index 766b94..7df8b3 100644
--- a/cpu/at32uc/at32uc3a0xxx/clk.c
+++ b/cpu/at32uc/at32uc3a0xxx/clk.c
@@ -43,7 +43,8 @@ void clk_init ( void )
     sm_writel(PM_MCCTRL, SM_BF(MCSEL, 1) | SM_BIT(OSC0EN));
   }
   /* wait for osc0 */
-  while (!(sm_readl(PM_POSCSR) & SM_BIT(OSC0RDY))) ;
+  while (!(sm_readl(PM_POSCSR) & SM_BIT(OSC0RDY))) ;
+  ;
  /* run from osc0 */
  sm_writel(PM_MCCTRL, SM_BF(OSC0SEL, 1) | SM_BIT(OSC0EN));
@@ -59,11 +60,13 @@ void clk_init ( void )
     sm_writel(PM_POSCSR, SM_BF(OSC0RDY)) | SM_BIT(ERRATA));
  } while (0);
  /* We cannot write the CKSEL register before the ready-signal is set. */
-  while (!(sm_readl(PM_POSCSR) & SM_BIT(CKRDY))) ;
+  while (!(sm_readl(PM_POSCSR) & SM_BIT(CKRDY))) ;
+  ;
  #endif
  /* Set up clocks for the CPU and all peripheral buses */
  ckSEL = 0;
  gclk_init = 0;
  diff --git a/cpu/at32uc/cpu.c b/cpu/at32uc/cpu.c
index 4a9542..d145e1 100644
--- a/cpu/at32uc/cpu.c
+++ b/cpu/at32uc/cpu.c
@@ -55,7 +55,7 @@ int cpu_init ( void )
     asm volatile("csrf %0": :"i"(SYSREG_EM_OFFSET));
  }
  if (gclk_init)
-   if (gclk_init)
+   gclk_init();
  return 0;
```

```c
diff --git a/cpu/at32uc/flashc.c b/cpu/at32uc/flashc.c
index e626e1f..2244b2e 100644
--- a/cpu/at32uc/flashc.c
+++ b/cpu/at32uc/flashc.c
@@ -56,7 +56,7 @@ unsigned long flash_init ( void )
     asm volatile("csrf %0": "i"(SYSREG_EM_OFFSET));
  }
  if (gclk_init)
-   if (gclk_init)
+   gclk_init();
  return 0;
```

```c
diff --git a/cpu/at32uc/flashc.c b/cpu/at32uc/flashc.c
index e626e1f..2244b2e 100644
--- a/cpu/at32uc/flashc.c
+++ b/cpu/at32uc/flashc.c
@@ -56,7 +56,7 @@ unsigned long flash_init ( void )
     flash_info[0].sector_count = size / (128*4);
  }
  for(i=0; i<flash_info[0].sector_count; i++){
+  for (i = 0; i < flash_info[0].sector_count; i++) {
     flash_info[0].start[i] = i*128*4 + CFG_FLASH_BASE;
  }
  for(i=0; i<flash_info[0].sector_count; i++){
+  for (i = 0; i < flash_info[0].sector_count; i++) {
     flash_info[0].start[i] = i*128*4 + CFG_FLASH_BASE;
  }
```

```c
diff --git a/cpu/at32uc/flashc.c b/cpu/at32uc/flashc.c
index e626e1f..2244b2e 100644
--- a/cpu/at32uc/flashc.c
+++ b/cpu/at32uc/flashc.c
@@ -73,19 +73,20 @@ void flash_print_info(flash_info_t *info)
  }
  static void flash_wait_ready(void)
{```
int flash_erase(flash_info_t *info, int s_first, int s_last)
{
    int page;
    for (page = s_first; page < s_last; page++) {
        flash_wait_ready();
        flashc_writel(FCMD, FLASHC_BF(CMD, FLASHC_EP)
                    | FLASHC_BF(PAGEN, page)
                    | FLASHC_BF(KEY, 0xa5));
    }
    return ERR_OK;
}

static void write_flash_page(unsigned int pagen, const u32 *data)
{
    /* fill page buffer*/
    flash_wait_ready();
    for (i=0; i <128 ; i++) {
        dst[i]= data[i];
    }
    /* issue write command */
    flashc_writel(FCMD, 
                    FLASHC_BF(CMD, FLASHC_WP) |
                    FLASHC_BF(PAGEN, pagen) |
                    FLASHC_BF(KEY, 0xa5));
}

unsigned long sram_init(const struct sram_config *config)
{
    switch (config->data_bits) {
    case 8:
        dbw = 0;
        break;
    case 16:
        dbw = 1;
        break;
    case 32:
        dbw = 2;
        break;
    default:
        panic("Invalid number of databits for SRAM");
    }
    sram_size = ((1<config->address_bits) * (config->data_bits/8));
    return sram_size;
}

int write_buff(flash_info_t *info, uchar *src, ulong addr, ulong count)
{
    for (i = 0; i < count ; i += 128*4) {
        unsigned int pagen;
        pagen = (addr - CFG_FLASH_BASE +i) / (128*4) ;
        write_flash_page(pagen, (u32*) (src +i));
    }
}

int sram_init(const struct sram_config *config)
{
    switch (config->data_bits) {
    case 8:
        dbw = 0;
        break;
    case 16:
        dbw = 1;
        break;
    case 32:
        dbw = 2;
        break;
    default:
        panic("Invalid number of databits for SRAM");
    }
    sram_size = ((1<config->address_bits) * (config->data_bits/8));
    return sram_size;
}

int sram_write(const struct sram_config *config)
{
    switch (config->data_bits) {
    case 8:
        dbw = 0;
        break;
    case 16:
        dbw = 1;
        break;
    case 32:
        dbw = 2;
        break;
    default:
        panic("Invalid number of databits for SRAM");
    }
    sram_size = ((1<config->address_bits) * (config->data_bits/8));
    return sram_size;
}

int sram_read(const struct sram_config *config)
{
    switch (config->data_bits) {
    case 8:
        dbw = 0;
        break;
    case 16:
        dbw = 1;
        break;
    case 32:
        dbw = 2;
        break;
    default:
        panic("Invalid number of databits for SRAM");
    }
    sram_size = ((1<config->address_bits) * (config->data_bits/8));
    return sram_size;
}

int sram_erase(const struct sram_config *config)
{
    switch (config->data_bits) {
    case 8:
        dbw = 0;
        break;
    case 16:
        dbw = 1;
        break;
    case 32:
        dbw = 2;
        break;
    default:
        panic("Invalid number of databits for SRAM");
    }
    sram_size = ((1<config->address_bits) * (config->data_bits/8));
    return sram_size;
}

int sram_init(const struct sram_config *config)
{
    switch (config->data_bits) {
    case 8:
        dbw = 0;
        break;
    case 16:
        dbw = 1;
        break;
    case 32:
        dbw = 2;
        break;
    default:
        panic("Invalid number of databits for SRAM");
    }
    sram_size = ((1<config->address_bits) * (config->data_bits/8));
    return sram_size;
}

int sram_write(const struct sram_config *config)
{
    switch (config->data_bits) {
    case 8:
        dbw = 0;
        break;
    case 16:
        dbw = 1;
        break;
    case 32:
        dbw = 2;
        break;
    default:
        panic("Invalid number of databits for SRAM");
    }
    sram_size = ((1<config->address_bits) * (config->data_bits/8));
    return sram_size;
}

int sram_read(const struct sram_config *config)
{
    switch (config->data_bits) {
    case 8:
        dbw = 0;
        break;
    case 16:
        dbw = 1;
        break;
    case 32:
        dbw = 2;
        break;
    default:
        panic("Invalid number of databits for SRAM");
    }
    sram_size = ((1<config->address_bits) * (config->data_bits/8));
    return sram_size;
}

int sram_erase(const struct sram_config *config)
{
    switch (config->data_bits) {
    case 8:
        dbw = 0;
        break;
    case 16:
        dbw = 1;
        break;
    case 32:
        dbw = 2;
        break;
    default:
        panic("Invalid number of databits for SRAM");
    }
    sram_size = ((1<config->address_bits) * (config->data_bits/8));
    return sram_size;
}

int sram_init(const struct sram_config *config)
{
    switch (config->data_bits) {
    case 8:
        dbw = 0;
        break;
    case 16:
        dbw = 1;
        break;
    case 32:
        dbw = 2;
        break;
    default:
        panic("Invalid number of databits for SRAM");
    }
    sram_size = ((1<config->address_bits) * (config->data_bits/8));
    return sram_size;
}

int sram_write(const struct sram_config *config)
{
    switch (config->data_bits) {
    case 8:
        dbw = 0;
        break;
    case 16:
        dbw = 1;
        break;
    case 32:
        dbw = 2;
        break;
    default:
        panic("Invalid number of databits for SRAM");
    }
    sram_size = ((1<config->address_bits) * (config->data_bits/8));
    return sram_size;
}

int sram_read(const struct sram_config *config)
{
    switch (config->data_bits) {
    case 8:
        dbw = 0;
        break;
    case 16:
        dbw = 1;
        break;
    case 32:
        dbw = 2;
        break;
    default:
        panic("Invalid number of databits for SRAM");
    }
    sram_size = ((1<config->address_bits) * (config->data_bits/8));
    return sram_size;
}

int sram_erase(const struct sram_config *config)
{
    switch (config->data_bits) {
    case 8:
        dbw = 0;
        break;
    case 16:
        dbw = 1;
        break;
    case 32:
        dbw = 2;
        break;
    default:
        panic("Invalid number of databits for SRAM");
    }
    sram_size = ((1<config->address_bits) * (config->data_bits/8));
    return sram_size;
}
diff --git a/include/asm-avr32/arch-at32uc3a0xxx/portmux.h b/include/asm-avr32/arch-at32uc3a0xxx/portmux.h
index 2877206..c9b17a8 100644
--- a/include/asm-avr32/arch-at32uc3a0xxx/portmux.h
+++ b/include/asm-avr32/arch-at32uc3a0xxx/portmux.h
@@ -25 ,7 +25 ,7 @@
 # include <asm/arch-common/portmux-gpio.h>
 # include <asm/arch/memory-map.h>

-# define PORTMUX_PORT(x) ( (void *) ( GPIO_BASE + (x) * 0x100 ) )
+# define PORTMUX_PORT(x) ((void *) ( GPIO_BASE + (x) * 0x100 ))

 # define PORTMUX_PORT_A PORTMUX_PORT(0)
 # define PORTMUX_PORT_B PORTMUX_PORT(1)
 # define PORTMUX_PORT_C PORTMUX_PORT(2)

--- a/include/asm-avr32/arch-at32uc3a0xxx/clk.h
+++ b/include/asm-avr32/arch-at32uc3a0xxx/clk.h
@@ -37 ,7 +37 ,6 @@ static inline unsigned long get_cpu_clk_rate ( void )
 
 static inline unsigned long get_hsb_clk_rate ( void )
 {
-  // TODO HSB is always the same as cpu-rate
+  return MAIN_CLK_RATE >> CFG_CLKDIV_CPU;
 
 static inline unsigned long get_pba_clk_rate ( void )
{
--- a/include/asm-avr32/arch-at32uc3a0xxx/memory-map.h
+++ b/include/asm-avr32/arch-at32uc3a0xxx/memory-map.h
@@ -70 ,6 +70 ,6 @@
 # define TC_BASE 0xFFFF3800
 # define ADC_BASE 0xFFFF3C00

-# define GPIO_PORT (x) ( (void *) ( GPIO_BASE + (x) * 0x100 ) )
+# define GPIO_PORT (x) ((void *) ( GPIO_BASE + (x) * 0x100 ))

 # endif /* __AT32UC3A0512_MEMORY_MAP_H__ */

--- a/include/asm-avr32/arch-at32uc3a0xxx/addrspace.h
+++ b/include/asm-avr32/arch-at32uc3a0xxx/addrspace.h
@@ -33 ,7 +33 ,7 @@ static __ inline __ unsigned long virt_to_phys ( volatile void * address )
 
 static __ inline __ void * phys_to_virt ( unsigned long address )
 {
-  return ( void *) address;
+  return (void *) address;
 
 #define __inline__ void * phys_to_virt(unsigned long address)
 static __ inline__ void *phys_to_virt(unsigned long address)
{

-  return (void *) address;
  
 static __inline__ void * phys_to_virt(unsigned long address)
{

  return (void *) address;

--- a/cpu/at32uc/cache.c
+++ b/cpu/at32uc/cache.c
@@ -28 ,7 +28 ,7 @@
 /* No cache to clean in the at32uc3. */
 }

--- a/board/atmel/atevk1100/atevk1100.c
+++ b/board/atmel/atevk1100/atevk1100.c
@@ -88,7 +88,7 @@ phys_size_t initdram(int board_type)

-  sram_base = map_physmem(EBI_SRAM_CS2_BASE, EBI_SRAM_CS2_SIZE, MAP_NOCACHE);
+  sram_base = map_physmem(EBI_SRAM_CS2_BASE, EBI_SRAM_CS2_SIZE, MAP_NOCACHE);

  return (void *) sram_base;

--- a/boards/atmel/atevk1100/atevk1100.c
+++ b/boards/atmel/atevk1100/atevk1100.c
@@ -88,7 +88,7 @@ phys_size_t initdram(int board_type)

-  sram_base = map_physmem(EBI_SRAM_CS2_BASE, EBI_SRAM_CS2_SIZE, MAP_NOCACHE);
+  sram_base = map_physmem(EBI_SRAM_CS2_BASE, EBI_SRAM_CS2_SIZE, MAP_NOCACHE);

  return (void *) sram_base;

--- a/boards/atmel/atevk1100/atevk1100.c
+++ b/boards/atmel/atevk1100/atevk1100.c
@@ -88,7 +88,7 @@ phys_size_t initdram(int board_type)

-  sram_base = map_physmem(EBI_SRAM_CS2_BASE, EBI_SRAM_CS2_SIZE, MAP_NOCACHE);
+  sram_base = map_physmem(EBI_SRAM_CS2_BASE, EBI_SRAM_CS2_SIZE, MAP_NOCACHE);

  return (void *) sram_base;

--- a/boards/atmel/atevk1100/atevk1100.c
+++ b/boards/atmel/atevk1100/atevk1100.c
@@ -88,7 +88,7 @@ phys_size_t initdram(int board_type)

-  sram_base = map_physmem(EBI_SRAM_CS2_BASE, EBI_SRAM_CS2_SIZE, MAP_NOCACHE);
+  sram_base = map_physmem(EBI_SRAM_CS2_BASE, EBI_SRAM_CS2_SIZE, MAP_NOCACHE);

  return (void *) sram_base;

--- a/boards/atmel/atevk1100/atevk1100.c
+++ b/boards/atmel/atevk1100/atevk1100.c
@@ -88,7 +88,7 @@ phys_size_t initdram(int board_type)

-  sram_base = map_physmem(EBI_SRAM_CS2_BASE, EBI_SRAM_CS2_SIZE, MAP_NOCACHE);
+  sram_base = map_physmem(EBI_SRAM_CS2_BASE, EBI_SRAM_CS2_SIZE, MAP_NOCACHE);

  return (void *) sram_base;

--- a/boards/atmel/atevk1100/atevk1100.c
+++ b/boards/atmel/atevk1100/atevk1100.c
@@ -88,7 +88,7 @@ phys_size_t initdram(int board_type)

-  sram_base = map_physmem(EBI_SRAM_CS2_BASE, EBI_SRAM_CS2_SIZE, MAP_NOCACHE);
+  sram_base = map_physmem(EBI_SRAM_CS2_BASE, EBI_SRAM_CS2_SIZE, MAP_NOCACHE);

  return (void *) sram_base;

--- a/boards/atmel/atevk1100/atevk1100.c
+++ b/boards/atmel/atevk1100/atevk1100.c
@@ -88,7 +88,7 @@ phys_size_t initdram(int board_type)

-  sram_base = map_physmem(EBI_SRAM_CS2_BASE, EBI_SRAM_CS2_SIZE, MAP_NOCACHE);
+  sram_base = map_physmem(EBI_SRAM_CS2_BASE, EBI_SRAM_CS2_SIZE, MAP_NOCACHE);

  return (void *) sram_base;

--- a/boards/atmel/atevk1100/atevk1100.c
+++ b/boards/atmel/atevk1100/atevk1100.c
@@ -88,7 +88,7 @@ phys_size_t initdram(int board_type)

-  sram_base = map_physmem(EBI_SRAM_CS2_BASE, EBI_SRAM_CS2_SIZE, MAP_NOCACHE);
+  sram_base = map_physmem(EBI_SRAM_CS2_BASE, EBI_SRAM_CS2_SIZE, MAP_NOCACHE);

  return (void *) sram_base;
Appendix C

Unsubmitted U-Boot changes

diff --git a/board/atmel/atvek1100/atvek1100.c b/board/atmel/atvek1100/atvek1100.c
1 index e9c5452..d2d7893 100644
2 +++ a/board/atmel/atvek1100/atvek1100.c
3 @@ -105,7 +105,10 @@ phys_size_t initdram(int board_type)
4 int board_early_init_r(void)
5 {
6     /* Physical address of phy (0xff = auto-detect) */
7     /* Physical address of phy. This is not used when the address is
8     * autodetected. See CONFIG_MACB_SEARCH_PHY.
9     */
10     gd->bd->bi_phy_id[0] = 0xff;
11     return 0;
12 }
13 diff --git a/drivers/net/macb.c b/drivers/net/macb.c
14 index 31a4ffbe..c8beb82 100644
15 --- a/drivers/net/macb.c
16 +++ b/drivers/net/macb.c
17 @@ -327,6 +327,7 @@ static void macb_phy_reset(struct macb_device *macb)
18     netdev->name, status);
19 }
20 
21 +#ifdef CONFIG_MACB_SEARCH_PHY
22 static int macb_phy_find(struct macb_device *macb)
23 {
24     int i;
25     @@ -347,6 +348,8 @@ static int macb_phy_find(struct macb_device *macb)
26     return 0;
27 }
28 +#endif /* CONFIG_MACB_SEARCH_PHY */
29 
30 static int macb_phy_init(struct macb_device *macb)
31 {
32     @@ -356,12 +359,12 @@ static int macb_phy_init(struct macb_device *macb)
33     int media, speed, duplex;
34     int i;
35     - if (macb->phy_addr == 0xff) {
36         - /* Auto-detect phy_addr */
37         - if (!macb_phy_find(macb)) {
38             - return 0;
39     - }
40     }
41 +#ifdef CONFIG_MACB_SEARCH_PHY
42     + /* Auto-detect phy_addr */
43     + if (!macb_phy_find(macb)) {
44             + return 0;
45     + }
46 +#endif /* CONFIG_MACB_SEARCH_PHY */
47     / * Check if the PHY is up to snuff... */
48     diff --git a/drivers/serial/atmel_usart.c b/drivers/serial/atmel_usart.c
49 index a358871..f3b146c 100644
50 --- a/drivers/serial/atmel_usart.c
51 +++ b/drivers/serial/atmel_usart.c
52 @@ -58,9 +58,6 @@ int serial_init(void)
\begin{verbatim}
{
  \begin{verbatim}
utart3_writel(CR, USART3_BIT(RSTRX) | USART3_BIT(RSTTX));

  /* Make sure that all interrupts are disabled during startup. */
  usart3_writel(IDR, 0xffffffff);

  /* Make sure that all interrupts are disabled during startup. */
  usart3_writel(CR, USART3_BIT(RXTEN) | USART3_BIT(TXEN));

  serial_setbrg();

  usart3_writel(CR, USART3_BIT(RXTEN) | USART3_BIT(TXEN));
\end{verbatim}
\end{verbatim}
\end{verbatim}

\begin{verbatim}
diff --git a/include/configs/atevk1100.h b/include/configs/atevk1100.h
index ad134f8..e6e4746 100644
--- a/include/configs/atevk1100.h
+++ b/include/configs/atevk1100.h
@@ -149,6 +149,10 @@
 * MII mode. Set CONFIG_MACB_FORCE10M flag if clock is too slow for 100Mbit.
 */

+#define CONFIG_MACB_FORCE10M 1

+/*
+ * On this board, the PHY can be found at different addresses (either 1 or 7).
+ */
+#define CONFIG_MACB_SEARCH_PHY 1

+#define CONFIG_ATMEL_USART 1
+#define CONFIG_ATMEL_SPI 1
\end{verbatim}
\end{verbatim}
\end{verbatim}
Appendix D

Linux kernel patches

D.0  Cover letter

From 6677f489f529d76a17c5ab6900f81dbcfbc8b5d1 Mon Sep 17 00:00:00 2001
Content-Type: text/plain; charset=UTF-8
Message-Id: <cover.1242388773.git.rangoy@mnops.(none)>
From: =?utf-8?q?Gunnar=Rang=C3=B8y?= <rangoy@mnops.(none)>
Date: Fri, 15 May 2009 13:59:33 +0200
Subject: [PATCH 00/29] AVR32: Support for EVK1100

These patches are the changes we made to Linux to make it possible to run on the ATEVK1100 evaluation kit (with the UC3A0512ES microcontroller).

We run Linux from 4 MB of SRAM added to the EVK1100. SDRAM hasn’t been tested.

What works:
− Booting linux
− Serial console
− Networking
− Root filesystem on NFS
− Loading FDPIC ELF files (only statically linked files).
− LEDs
− Booting busybox, running telnet server, +++

What is known not to work:
− Debugging applications (ptrace)
− Shared libraries
− SPI, USB
− Has a tendency to crash when out of memory (which happens quite frequently with only 4 MB of RAM)

Patches in this series are in a somewhat random order, but the overall pattern is:
1–19 Some changes making it easier to add AVR32A support.
20–26 AVR32A support
27−28 UC3A support
29 EVK1100 support

The line between the patches which add AVR32A support and the patches which prepare for AVR32A support is somewhat fuzzy.

Note that these patches still needs a lot of work to be suited for inclusion in the Linux kernel.

The changes we made to GCC and binutils will be posted later.

This patch series is coauthored by:
− Olav Morken <olavmrk@gmail.com>
− Gunnar Rangoy <gunnar@rangoy.com>
− Paul Driveklepp <pauldriveklepp@gmail.com>

Gunnar Rangoy (29):
  macb: limit to 10 Mbit/s if the clock is too slow to handle 100 Mbit/s
  AVR32: Don't clear registers when starting a new thread.
  AVR32: split paging_init into mmi init, free memory init and exceptions init.
  AVR32: use task_pt_regs in copy_thread.
  AVR32: FDPIC ELF support.
  AVR32: Introduce AVR32 CACHE and AVR32 UNALIGNED Kconfig options
  AVR32: mm/tlb.c should only be enabled with CONFIG_MMU.
  AVR32: mm/fault for !CONFIG_MMU.
  AVR32: ioremap and iounmap for !CONFIG_MMU.
  AVR32: MMU dummy functions for chips without MMU.
  AVR32: mm_context_t for !CONFIG_MMU
  AVR32: Add cache−function stubs for chips without cache.
  AVR32: copy_user for chips that cannot do unaligned memory access.
  AVR32: csum_partial: Support chips that cannot do unaligned memory accesses.
  AVR32: avoid unaligned access in uaccess.h
  AVR32: memcpy implementation for chips that cannot do unaligned memory accesses.
  AVR32: Mark AVR32B specific assumptions with CONFIG_SUBARCH_AVR32B in strnlen.
  AVR32: mm/dma−coherent.c − ifdef AVR32B specific code.
  AVR32: Disable ret_if_privileged macro for !CONFIG_SUBARCH_AVR32B.
  AVR32: AVR32A support in Kconfig
  AVR32: AVR32A address space support.
  AVR32: Change maximum task size for AVR32A
  AVR32: Fix uaccess __range_ok macro for AVR32A.
  AVR32: Support for AVR32A (entry−avr32a.c)
  AVR32: Change HIGHMEM_START for AVR32A.
  AVR32: New pt_regs layout for AVR32A.
  AVR32: UC3A0512ES Interrupt bug workaround
  AVR32: UC3A0xxx−support
  AVR32: Board support for ATEVK1100

arch/avr32/Kconfig
<table>
<thead>
<tr>
<th>File Path</th>
<th>Change Type</th>
<th>Change Count</th>
</tr>
</thead>
<tbody>
<tr>
<td>arch/avr32/Makefile</td>
<td>+</td>
<td>17</td>
</tr>
<tr>
<td>arch/avr32/boards/atevk1100/Makefile</td>
<td>+</td>
<td>1</td>
</tr>
<tr>
<td>arch/avr32/boards/atevk1100/setup.c</td>
<td>++</td>
<td>121</td>
</tr>
<tr>
<td>arch/avr32/configs/atevk1100_defconfig</td>
<td>++</td>
<td>778</td>
</tr>
<tr>
<td>arch/avr32/include/asm/addrspace.h</td>
<td>-</td>
<td>12</td>
</tr>
<tr>
<td>arch/avr32/include/asm/asm.h</td>
<td>-</td>
<td>28</td>
</tr>
<tr>
<td>arch/avr32/include/asm/checksum.h</td>
<td>+</td>
<td>28</td>
</tr>
<tr>
<td>arch/avr32/include/asm/elf.h</td>
<td>+</td>
<td>10</td>
</tr>
<tr>
<td>arch/avr32/include/asm/io.h</td>
<td>+</td>
<td>29</td>
</tr>
<tr>
<td>arch/avr32/include/asm/irqflags.h</td>
<td>+</td>
<td>8</td>
</tr>
<tr>
<td>arch/avr32/include/asm/mmm.h</td>
<td>+</td>
<td>16</td>
</tr>
<tr>
<td>arch/avr32/include/asm/mm_flags.h</td>
<td>+</td>
<td>40</td>
</tr>
<tr>
<td>arch/avr32/include/asm/page.h</td>
<td>+</td>
<td>13</td>
</tr>
<tr>
<td>arch/avr32/include/asm/processor.h</td>
<td>+</td>
<td>5</td>
</tr>
<tr>
<td>arch/avr32/include/asm/trace.h</td>
<td>++</td>
<td>79</td>
</tr>
<tr>
<td>arch/avr32/include/asm/uaccess.h</td>
<td>+</td>
<td>31</td>
</tr>
<tr>
<td>arch/avr32/kernel/Makefile</td>
<td>+</td>
<td>1</td>
</tr>
<tr>
<td>arch/avr32/kernel/cpu.c</td>
<td>+</td>
<td>1</td>
</tr>
<tr>
<td>arch/avr32/kernel/entry-avr32a.S</td>
<td>+</td>
<td>705</td>
</tr>
<tr>
<td>arch/avr32/kernel/process.c</td>
<td>-</td>
<td>2</td>
</tr>
<tr>
<td>arch/avr32/kernel/setup.c</td>
<td>+</td>
<td>22</td>
</tr>
<tr>
<td>arch/avr32/lib/Makefile</td>
<td>-</td>
<td>7</td>
</tr>
<tr>
<td>arch/avr32/lib/copy_user-nounaligned.S</td>
<td>+</td>
<td>124</td>
</tr>
<tr>
<td>arch/avr32/lib/csum_partial.S</td>
<td>+</td>
<td>31</td>
</tr>
<tr>
<td>arch/avr32/lib/memcpy-nounaligned.S</td>
<td>+</td>
<td>86</td>
</tr>
<tr>
<td>arch/avr32/lib/strnlen_user.S</td>
<td>+</td>
<td>4</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/Kconfig</td>
<td>+</td>
<td>28</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/Makefile</td>
<td>+</td>
<td>9</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/at32uc3a0xxx.c</td>
<td>++</td>
<td>1453</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/clock.c</td>
<td>+</td>
<td>270</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/clock.h</td>
<td>+</td>
<td>30</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/cpufreq.c</td>
<td>+</td>
<td>111</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/irqflags.h</td>
<td>+</td>
<td>279</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/gpio.c</td>
<td>+</td>
<td>453</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/gpio.h</td>
<td>+</td>
<td>77</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/hmatrix.c</td>
<td>+</td>
<td>88</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/smc.c</td>
<td>+</td>
<td>281</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/smc.h</td>
<td>+</td>
<td>127</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/sram.h</td>
<td>+</td>
<td>78</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/include/mach/at32uc3a0xxx.h</td>
<td>+</td>
<td>121</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/include/mach/board.h</td>
<td>+</td>
<td>21</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/include/mach/chip.h</td>
<td>+</td>
<td>35</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/include/mach/cpu.h</td>
<td>+</td>
<td>45</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/include/mach/hmatrix.h</td>
<td>+</td>
<td>55</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/include/mach/init.h</td>
<td>+</td>
<td>18</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/include/mach/io.h</td>
<td>+</td>
<td>38</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/include/mach/irq.h</td>
<td>+</td>
<td>14</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/include/mach/pm.h</td>
<td>+</td>
<td>51</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/include/mach/portmux.h</td>
<td>+</td>
<td>29</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/include/mach/smc.h</td>
<td>+</td>
<td>113</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/include/mach/sram.h</td>
<td>+</td>
<td>30</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/intc.c</td>
<td>+</td>
<td>217</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/intc.h</td>
<td>+</td>
<td>329</td>
</tr>
<tr>
<td>File Path</td>
<td>Change Type</td>
<td>Change Count</td>
</tr>
<tr>
<td>-----------</td>
<td>-------------</td>
<td>--------------</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/pdca.c</td>
<td>+</td>
<td>48</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/pm-at32uc3a0xxx.S</td>
<td>+++</td>
<td>174</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/pm.c</td>
<td>++++</td>
<td>243</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/pm.h</td>
<td>++</td>
<td>112</td>
</tr>
<tr>
<td>arch/avr32/mach-at32uc3a/sdramc.h</td>
<td>+</td>
<td>76</td>
</tr>
<tr>
<td>arch/avr32/mm/Makefile</td>
<td>-</td>
<td>3</td>
</tr>
<tr>
<td>arch/avr32/mm/cache-nocache.c</td>
<td>+</td>
<td>36</td>
</tr>
<tr>
<td>arch/avr32/mm/dma-coherent.c</td>
<td>+</td>
<td>2</td>
</tr>
<tr>
<td>arch/avr32/mm/fault-nommu.c</td>
<td>+</td>
<td>19</td>
</tr>
<tr>
<td>arch/avr32/mm/init.c</td>
<td>++++</td>
<td>243</td>
</tr>
<tr>
<td>arch/avr32/mm/ioremap-nommu.c</td>
<td>+</td>
<td>31</td>
</tr>
<tr>
<td>drivers/net/macb.c</td>
<td>+</td>
<td>7</td>
</tr>
<tr>
<td>fs/Kconfig.binfmt</td>
<td></td>
<td>2</td>
</tr>
</tbody>
</table>

67 files changed, 7400 insertions(+), 40 deletions(-)
D.1. NETWORK SPEED LIMITING

create mode 100644 arch/avr32/mm/fault-nommu.c
create mode 100644 arch/avr32/mm/ioremap-nommu.c

D.1 Network speed limiting

From 7e770576ff3d38ccdb78c091229db93791aa54 Mon Sep 17 00:00:00 2001
Message-Id: <7e770576ff3d38ccdb78c091229db93791aa54.1242388774.git.rangoy@mnops.(none)>
In-Reply-To: <cover.1242388773.git.rangoy@mnops.(none)>
References: <cover.1242388773.git.rangoy@mnops.(none)>
Date: Mon, 27 Apr 2009 13:03:30 +0200
Subject: [PATCH 01/29] macb: limit to 10 Mbit/s if the clock is too slow to handle 100 Mbit/s

The macb requires a 50 MHz clock to handle 100 Mbit/s in RMII mode, and
a 25 MHz clock to handle 100 Mbit/s in MII mode. This patch checks the
clock speed, and limits the PHY to 10 Mbit/s if the clock is too slow.

---
diff --git a/drivers/net/macb.c b/drivers/net/macb.c
index 01f7a31..99000d4 100644
--- a/drivers/net/macb.c
+++ b/drivers/net/macb.c
@@ -192,6 +192,7 @@ static int macb_mii_probe(struct net_device *dev)
 struct phy_device *phydev = NULL;
 struct eth_platform_data *pdata ;
 int phy_addr ;
+unsigned long pclk_hz ;

 /* find the first phy */
 for ( phy_addr = 0; phy_addr < PHY_MAX_ADDR ; phy_addr ++ ) {
@@ -226,6 +227,12 @@ static int macb_mii_probe(struct net_device *dev)
 struct phy_device *phydev = NULL; 
 struct eth_platform_data *pdata ;
 int phy_addr ;
+unsigned long pclk_hz ;
+/* disable 100 Mbit if clock is too slow */
+if ( pclk_hz < 25000000 ||
+ ( pclk_hz < 50000000 && pdata && pdata->is_rmii ))
+-phydev->supported &= ~SUPPORTED_100baseT_Half & ~SUPPORTED_100baseT_Full ;
+phydev->advertising = phydev->supported ;
 bp->link = 0; 
 --

1.6.2.2

D.2 Avoid register reset

From e3888c17d786dbf5b8c88be6fb5bdff90fd205c Mon Sep 17 00:00:00 2001
Message-Id: <e3888c17d786dbf5b8c88be6fb5bdff90fd205c.1242388774.git.rangoy@mnops.(none)>
In-Reply-To: <cover.1242388773.git.rangoy@mnops.(none)>
References: <cover.1242388773.git.rangoy@mnops.(none)>
Date: Fri, 24 Apr 2009 15:02:21 +0200
Subject: [PATCH 02/29] AVR32: Don't clear registers when starting a new thread.

Not certain about this patch, but we can't clear the registers here,
since the FDPIC ELF loader stores a pointer to the process' load map
in a register before this function is called.

---
diff --git a/arch/avr32/include/assembler/processor.h b/arch/avr32/include/assembler/processor.h
index 49a8825..3fb9664 100644
--- a/arch/avr32/include/assembler/processor.h
+++ b/arch/avr32/include/assembler/processor.h
@@ -132,7 +132,6 @@ struct thread_struct {
 #define start_thread(regs, neu_pc, neu_sp) \
 do { 
 set_fs(USER_DS); \\

D.3 Split paging function

This change is necessary to allow AVR32A to initialize free memory and exceptions without having an MMU.

---
arch/avr32/kernel/setup.c | 22 ++++++++++++++++++++
arch/avr32/mm/init.c | 48 +++++++++++++++++++++ - - - - - - - - - - - - - - - - - - - - - - - -
2 files changed, 45 insertions(+), 25 deletions(-)

diff --git a/arch/avr32/kernel/setup.c b/arch/avr32/kernel/setup.c
index 5c70853..f7e734a 100644
--- a/arch/avr32/kernel/setup.c
+++ b/arch/avr32/kernel/setup.c
@@ -536,6 +536,26 @@ static void __init setup_bootmem(void)
}
}
24
+exceptions_init();
+ paging_init();
+ resource_init();
+}
55
endif
56
diff --git a/arch/avr32/mm/init.c b/arch/avr32/mm/init.c
index fa92f66..646f935 100644
--- a/arch/avr32/mm/init.c
+++ b/arch/avr32/mm/init.c
@@ -33,36 +33,21 @@ page_aligned;
 struct page *empty_zero_page;
 EXPORT_SYMBOL(empty_zero_page);
 #ifdef CONFIG_MMU
+*/
+ Cache of MMU context last used.
+*/
+ unsigned long mmu_context_cache = NO_CONTEXT;
+*/
D.4. USE TASK_PT_REGS MACRO

71 * paging_init() sets up the page tables
72 **
73 * Initialize the MMU.
74 * This routine also unmaps the page at virtual kernel address 0, so
75 * that we can trap those pecky NULL-reference errors in the kernel.
76 * This function also reserves the zero-page, so that we can trap
77 * NULL-references
78 */
79
80 #define __init paging_init(void)
81 #define mmu_init(void)
82 {
83     extern unsigned long _evba;
84     void *zero_page;
85     int nid;
86     /*
87     * Make sure we can handle exceptions before enabling
88     * paging. Not that we should ever _ get _ any exceptions this
89     * early, but you never know...
90     */
91     printk("Exception vectors start at %p\n", &_evba);
92     sysreg_write(EVBA, (unsigned long)&_evba);
93     /*
94     * Since we are ready to handle exceptions now, we should let
95     * the CPU generate them...
96     */
97     __asm__ __volatile__ ("csrf %0" : : "i"(SR_EM_BIT));
98     /* Allocate the zero page. The allocator will panic if it
99      * fails. */
100     printkm ("CPU: Paging enabled\n");
101     enable_mmu();
102     printk ("CPU: Paging enabled\n");
103     memset(zero_page, 0, PAGE_SIZE);
104     empty_zero_page = virt_to_page(zero_page);
105     flush_dcache_page(empty_zero_page);
106 }
107 #endif /* CONFIG_MMU */
108 */
109 */
110 * Initialize the MMU, and configures available memory.
111 */
112 #define __init paging_init(void)
113 {
114     int nid;
115     /*
116     * Define CONFIG_MMU if the kernel is compiled with the
117     * MMU. */
118     #ifdef CONFIG_MMU
119         mmu_init();
120     #endif /* CONFIG_MMU */
121     for_each_online_node(nid) {
122         pg_data_t *pgdat = NODE_DATA(nid);
123         unsigned long zones_size[MAX_NR_ZONES];
124         memset(zero_page, 0, PAGE_SIZE);
125         empty_zero_page = virt_to_page(zero_page);
126         flush_dcache_page(empty_zero_page);
127     }
128     void __init mem_init(void)
129     {
130         void __init paging_init(void)
131         {
132             intern
133         }
134     }
135

D.4 Use task_pt_regs macro

1 From 12a2a1a6382f3278643b4d637d12266366dd858ff Mon Sep 17 00:00:00 2001
2 Message-Id: <12a2a1a6382f3278643b4d637d12266366dd858ff . 1242388774 . git . rangoy@mnops .( none )>
3 In-Reply-To: <cover . 1242388773 . git . rangoy@mnops .( none )>
4 References: <cover . 1242388773 . git . rangoy@mnops .( none )>
5 From: =?utf -8? q? Gunnar =20 Rang =C3= B8y ?= < gunnar@rangoy .com >
6 Date: Fri , 24 Apr 2009 15:13:37 +0200
7 Subject: [PATCH 04/29] AVR32: use task_pt_regs in copy_thread.
We already have the task_pt_regs macro, so we might as well use it.

---

```
arch/avr32/kernel/process.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
```

```
diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
index 134d6530..fd37fcf 100644
--- a/arch/avr32/kernel/process.c
+++ b/arch/avr32/kernel/process.c
@@ -337,7 +337,7 @@ int copy_thread(int nr, unsigned long clone_flags, unsigned long usp,
 { struct pt_regs * childregs;

- childregs = ((struct pt_regs *)(THREAD_SIZE + (unsigned long)task_stack_page(p))) -1;
+ childregs = task_pt_regs(p);
  *childregs = *regs;

 if (user_mode(regs))
```
D.6. INTRODUCE CACHE AND ALIGNED FLAGS

56  + depends on (FRV || BLACKFIN || (SUPERH32 && !MMU) || (AVR32 && !MMU))
57  help
58  
59  segments of a binary to be located in memory independently of each
60  
61  1.6.2.2

D.6 Introduce cache and aligned flags

1 From 4761e27a785a2dcd94ebee6b6ff6f60f09ba66d703 Mon Sep 17 00:00:00 2001
2 Message-Id: <4761e27a785a2dcd94ebee6b6ff6f60f09ba66d703.1242388774.git.rangoy@nops.(none)>
3 In-Reply-To: <cover.1242388773.git.rangoy@nops.(none)>
4 References: <cover.1242388773.git.rangoy@nops.(none)>
5 From: Gunnar Rangøy <gunnar@rangoy.com>
6 Date: Mon, 27 Apr 2009 12:25:41 +0200
7 Subject: [PATCH 06/29] AVR32: Introduce AVR32_CACHE and AVR32_UNALIGNED Kconfig options
8
9 Reorganizes Kconfig a bit, and adds AVR32_CACHE (for CPUs with cache),
10 and AVR32_UNALIGNED (for CPUs that can do unaligned accesses). Also adds
11 three Makefile-variables: $(MMUEXT), $(CACHEEXT) and $(ALIGNEXT). These
12 are set to 'nommu' for !MMU, 'nocache' for !AVR32_CACHE and
13 'nounaligned' for !AVR32_UNALIGNED.
14
15 17 arch/avr32/Kconfig | 17 +++++++++++++++++--
16 18 arch/avr32/Makefile | 14 ++++++++
17 2 files changed, 28 insertions(+), 3 deletions(-)
18
19 diff --git a/arch/avr32/Kconfig b/arch/avr32/Kconfig
20 index 26eca87..9e984b0 100644
21 --- a/arch/avr32/Kconfig
22 +++ b/arch/avr32/Kconfig
23 @@ -78,20 +78,31 @@ menu "System Type and features"
24 25
26 26-+ config SUBARCH_AVR32B
27 -- bool
28 config MMU
29   bool
30 + config SUBARCH_AVR32B
31 + bool
32 + select MMU
33 +
34+  + config PERFORMANCE_COUNTERS
35 +
36+  + config AVR32_CACHE
37 +
38 +
39 + config AVR32_UNALIGNED
40 +
41 + config PLATFORM_AT32AP
42 +
43 +
44+  + config SUBARCH_AVR32B
45 +
46+  + config SUBARCH_AVR32B
47 +
48+  + select SUBARCH_AVR32B
49 +
50+  + select MMU
51+  + select PERFORMANCE_COUNTERS
52+  + select ARCH_REQUIRES_GPIO_LIB
53+  + select GENERIC_ALLOCATOR
54+  + select AVR32_CACHE
55+  + select AVR32_UNALIGNED
56+  
57+ # CPU types
58+ diff --git a/arch/avr32/Makefile b/arch/avr32/Makefile
59 index b088e10..4864cb1 100644
60 --- a/arch/avr32/Makefile
61 +++ b/arch/avr32/Makefile
62 @@ -9,6 +9,20 @@
63 .PHONY: all
64 all: uImage vmlinux.elf
65+ifeq ($(CONFIG_MMU),)
66+MMUEXT=-nommu
67+endif
68+endif
69+endif
70+ifeq ($(CONFIG_AVR32_CACHE),)
D.7 Disable mm-tlb.c

1. From 53c afff9ec5f546ebeb17130137ad46bd3d70dc781 Mon Sep 17 00:00:00 2001
2. Message-Id: <53caffe9ec5f546ebeb17130137ad46bd3d70dc781.1242388774.git.rangoy@mnops.(none)>
3. In-Reply-To: <cover.1242388773.git.rangoy@mnops.(none)>
4. References: <cover.1242388773.git.rangoy@mnops.(none)>
5. From: =?utf-8?q?Gunnar=Rang=C3=B8y?= <gunnar@rangoy.com>
6. Date: Mon, 27 Apr 2009 12:58:23 +0200
7. Subject: [PATCH 07/29] AVR32: mm/tlb.c should only be enabled with CONFIG_MMU.

---
8. arch/avr32/mm/Makefile | 3 ++-
9. 1 files changed, 2 insertions(+), 1 deletions(-)
10.
11. diff --git a/arch/avr32/mm/Makefile b/arch/avr32/mm/Makefile
12. index 0066491..7d61b2c 100644
13. --- a/arch/avr32/mm/Makefile
14. +++ b/arch/avr32/mm/Makefile
15. @@ -3,4 +3,5 @@
16. #
17. obj-yy += ioremap.o cache.o fault.o tlb.o
18. +obj-yy += ioremap.o cache.o fault.o tlb.o
19. +obj-yy += ioremap.o cache.o fault.($MMUEXT).o
20. diff --git a/arch/avr32/mm/fault-nommu.c b/arch/avr32/mm/fault-nommu.c
21. new file mode 100644
22. index 0000000..a3ebd4f

D.8 fault.c for !CONFIG_MMU

1. From c3d37edc9dea393d59ca2186e41e8ec136cea69d Mon Sep 17 00:00:00 2001
2. Message-Id: <c3d37edc9dea393d59ca2186e41e8ec136cea69d.1242388774.git.rangoy@mnops.(none)>
3. In-Reply-To: <cover.1242388773.git.rangoy@mnops.(none)>
4. References: <cover.1242388773.git.rangoy@mnops.(none)>
5. From: =?utf-8?q?Gunnar=Rang=C3=B8y?= <gunnar@rangoy.com>
6. Date: Mon, 27 Apr 2009 13:01:42 +0200
7. Subject: [PATCH 08/29] AVR32: mm/fault for !CONFIG_MMU.

---
8. arch/avr32/mm/Fault.c | 2 +-
9. 2 files changed, 20 insertions(+), 1 deletions(-)
10. create mode 100644 arch/avr32/mm/fault-nommu.c
11.
12. diff --git a/arch/avr32/mm/Fault.c b/arch/avr32/mm/Fault.c
13. index 7d61b2c..7dbb5a6 100644
14. --- a/arch/avr32/mm/Fault.c
15. +++ b/arch/avr32/mm/Fault.c
16. @@ -3,5 +3,5 @@
17. #
18. obj-yy += ioremap.o cache.o fault.o tlb.o
19. +obj-yy += ioremap.o cache.o fault.$MMUEXT.o
20. +obj-yy += tlb.o
21.
22.
23.
24.
25.
26.
27.
28.
29.
D.9. IOREMAP AND IOUNMAP FOR CONFIG_MMU

--- /dev/null
+++ b/arch/avr32/mm/fault-nommu.c
@@ -0,0 +1,19 @@
+#include <linux/vmalloc.h>
+#include <linux/mm.h>
+include <linux/module.h>
+include <linux/io.h>
+include <asm/pgtable.h>
+include <asm/addrspace.h>
+include <asm/sysreg.h>
+asmlinkage void do_page_fault(unsigned long ecr, struct pt_regs *regs) {
+    /* As we don't enable the MPU, a page fault should never occur. */
+    panic("Impossible page fault");
+
+    asmlinkage void do_bus_error(unsigned long addr, int write_access,
+                        struct pt_regs *regs) {
+        printk(KERN_ALERT
+            "Bus error at physical address 0x%08lx (%s access)\n",
+            addr, write_access ? "write" : "read");
+        die("Bus Error", regs, SIGKILL);
+    }

D.9 ioremap and iounmap for CONFIG_MMU

--- /dev/null
+++ b/arch/avr32/mm/ioremap-nommu.c
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2004 -2006 Atmel Corporation
+ */
+# include <linux/vmalloc.h>
+# include <linux/mm.h>
+# include <linux/module.h>
+# include <linux/io.h>
+# include <asm/pgtable.h>
+# include <asm/addrspace.h>

1.6.2.2
D.10 MMU dummy functions

--- arch/avr32/include/asm/mmu_context.h | 40 ++++++++++++++++++++++++++++++++++
1 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/arch/avr32/include/asm/mmu_context.h b/arch/avr32/include/asm/mmu_context.h
index 27ff234..ed9f7bf 100644
--- a/arch/avr32/include/asm/mmu_context.h
+++ b/arch/avr32/include/asm/mmu_context.h
@@ -12,6 +12,8 @@ static inline void disable_mmu(void)
 sysreg_write(MMUCR, SYSREG_BIT(MMUCR_S));
 }

+static inline void disable_mmu(void)
+{ }
+
+static inline void enter_lazy_tlb(struct mm struct *mm, struct task struct *tsk)
+{ }
+
+static inline void switch_mmu(struct mm struct *prev,
+ *next,
+ struct task struct *tsk)
+{ }
+
+/* Nothing to do when we don't have an MMU. */
+
+static inline int init_new_context(struct task struct *tsk,
+ struct mm struct *mm)
+{ return 0;
+ }
+
+/* Initialize the context related info for a new mm struct
+ instance. */
+
+static inline void init_mmu(void)
+{ }
+
+/* Destroy context related info for an mm struct that is about
+ to be put to rest. */
+
+static inline void destroy_context(struct mm struct *mm)
+{ }
+
+#define deactivate_mmu(tsk,mm) do { } while(0)
+
+#define activate_mmu(prev, next) switch_mmu((prev), (next), NULL)
D.11. MM_CONTEXT_T FOR !CONFIG_MMU

```c
+#endif /* CONFIG_MMU */
#
+#endif /* __ASM_AVR32_MMU_CONTEXT_H */
--

1.6.2.2

---
arch/avr32/include/asm/mmu.h | 16 ++++++++++++++++
1 files changed, 16 insertions (+), 0 deletions (-)
```

This patch adds a struct for the mm_context_t for architectures without an MMU. This needs to be a struct because it needs to contain some specific "variables".

---

D.12 Add cache function stubs

```c
+#ifdef CONFIG_MMU
+#endif

---

+* Default "unsigned long" context */
+typedef unsigned long mm_context_t;

+*else /* CONFIG_MMU */
+
+*typedef struct {
+  struct vm_list_struct *vmlist;
+  unsigned long end_brk;
+}
+
+*endif CONFIG_BINFMT_ELF_FDPIC
+* unsigned long exec_fdpic_loadmap;
+* unsigned long interp_fdpic_loadmap;
+*endif
+
+*mm_context_t;

+*endif /* CONFIG_MMU */
+*endif /* __ASM_AVR32_MMU_H */
--

1.6.2.2

---

D.12 Add cache function stubs

```c
+#endif /* CONFIG_MMU */

---

+arch/avr32/sm/Makefile | 2 +- 
+arch/avr32/sm/cache-nocache.c | 36 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 37 insertions (+), 1 deletions (-)
create mode 100644 arch/avr32/sm/cache-nocache.c
```
D.13 copy_user.S for !CONFIG_NOUNALIGNED
```plaintext
# # include <asm/page.h>
# # include <asm/thread_info.h>
# # include <asm/asm.h>

/*
* __kernel_size_t
* __copy_user(void *to, const void *from, __kernel_size_t n)
*!
* Returns the number of bytes not copied. Might be off by
* max 3 bytes if we get a fault in the main loop.
*!
* The address-space checking functions simply fall through to
* the non-checking version.
*/

.text
.align 1
.global copy_from_user
.type copy_from_user, @function
+copy_from_user:
+branch_if_kernel r8, __copy_user
+ret_ifprivileged r8, r11, r10, r10
+rjmp __copy_user
+size copy_from_user, .-copy_from_user
+global copy_to_user
+type copy_to_user, @function
+copy_to_user:
+branch_if_kernel r8, __copy_user
+ret_ifprivileged r8, r12, r10, r10
+size copy_to_user, .-copy_to_user
+global __copy_user
+type __copy_user, @function
+__copy_user:
*/
```

---

Diff --git a/arch/avr32/lib/copy_user-nounaligned.S b/arch/avr32/lib/copy_user-nounaligned.S
new file mode 100644
index 0000000..1bf9d8d
+++ b/arch/avr32/lib/copy_user-nounaligned.S
@@ -0,0 +1,124 @@
+/*
+ * Copy to/from userspace with optional address space checking.
+ */
+ * Copyright 2004-2006 Atmel Corporation
+ */
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <asm/page.h>
+#include <asm/thread_info.h>
+#include <asm/asm.h>

+ /* First we check whether from or to are unaligned */
+ mov r9, r11
+ andl r9, 3, COH
+ mov r8, r12
+ andl r8, 3, COH
+*/
+ /* Is it impossible to align both? Branch to single-byte copies
+ if we can't align both.
+ */
+ cp.w r8, r9
+ brne 4f
+ /* Do they need alignment? */
+ cp.w r8, 0
+ brne 6f
+*/
+ At this point, both from and to are word-aligned */
+1: sub r10, 4
+2:
+ ld.w r8, r11+
+11: st.v r12++, r8
```
D.14 csum_partial: support for chips that cannot do unaligned accesses

From aa418bb9eeb14bf5c225b94bd0e3c3a2e5aeb18b Mon Sep 17 00:00:00 2001
Message-Id: <aa418bb9eeb14bf5c225b94bd0e3c3a2e5aeb18b.1242388774.git.rangoy@mnops. (none)>
In-Reply-To: <cover.1242388773.git.rangoy@mnops.(none)>
References: <cover.1242388773.git.rangoy@mnops.(none)>
From: =?utf-8?q?Gunnar Rang?é<gunnar@rangoy.com>
Date: Thu, 23 Apr 2009 15:50:11 +0200
Subject: [PATCH 14/29] AVR32: csum_partial: Support chips that cannot do unaligned memory accesses.

---
arch/avr32/include/asm/checksum.h | 28 ++++++++++++++++++++++++++++---
arch/avr32/lib/csum_partial.S | 31 +++++++++++++++++++++++++--
2 files changed, 59 insertions(+), 0 deletions(-)
diff --git a/arch/avr32/include/asm/checksum.h b/arch/avr32/include/asm/checksum.h
index 4ddbf42..866147f 100644

1.6.2.2
--- a/arch/avr32/include/asm/checksum.h
+++ b/arch/avr32/include/asm/checksum.h
@@ -44,15 +44,43 @@ static inline
  
 __wsum csum_partial_copy_nocheck(const void *src, void *dst, 
  int len, __wsum sum)
-{#ifdef CONFIG_AVR32_UNALIGNED
  return csum_partial_copy_generic(src, dst, len, sum, NULL, NULL);
+#else
  if (((unsigned long)src & 3) == 0 && ((unsigned long)dst & 3) == 0) {
     /* Both src & dst are aligned. Do it the fast way. */
     return csum_partial_copy_generic(src, dst, len, sum, NULL, NULL);
  } 
  return csum_partial_copy_nocheck(src, dst, len, sum);
+  } 
+  #endif
+  }
 
 static inline
 __wsum csum_partial_copy_from_user(const void __user *src, void *dst, 
  int len, __wsum sum, int *err_ptr)
-{#ifdef CONFIG_AVR32_UNALIGNED
  return csum_partial_copy_generic((const void __force *) src, dst, len, 
  sum, err_ptr, NULL);
+  } 
+  #else
+  int missing;
+  if (((unsigned long)src & 3) == 0 && ((unsigned long)dst & 3) == 0) {
+     /* Both src & dst are aligned. Do it the fast way. */
+     return csum_partial_copy_generic(src, dst, len, sum, NULL, NULL);
+  } 
+  missing = copy_from_user(dst, src, len);
+  if (missing) { 
+     memset(dst + len - missing, 0, missing);
+     *err_ptr = -EFAULT;
+  } 
+  return csum_partial(dst, len, sum);
+  #endif
+  }
 
 /* checksum complete words , aligned or not */
 3: sub r11, 4
 69: brlt 5f 
 70: *
+  #ifndef CONFIG_AVR32_UNALIGNED
  /* check whether the buffer is aligned */
  73: mov r8, r12 
  74:  andl r8, 3, COH 
  75:  brne 8f 
  76:  
  77:  
  78:  #endif
  
 78: 4:  ld.w r9, r12++ 
  79:  add r10, r9 
  80:  acr r10 
  81:  @@ -33,7 +41,13 @@ csum_partial: 
  82:  mov r8, 0 
  83:  cp r11, 2 
  84:  brlt 6f 
  85:  @#ifndef CONFIG_AVR32_UNALIGNED
  86:  ld.ub r9, r12[f] 
  87:  1dins.b r9:1, r12[0] 
  88:  sub r12, -2 
  89:  
  90:  @#else 
  91:  
  92:  4:  ld.ub r9, r12++ 
  93:  
  94:  
  95:  
  96:  
  97:  
  98:  
  99: 

D.14. CSUM_PARTIAL: SUPPORT FOR CHIPS THAT CANNOT DO UNALIGNED ACCESSES$53
D.15 Avoid unaligned access in uaccess.h

The patch fixes __get_user_check by calling copy_from_user if the pointer is unaligned. Note that there are three more macros that needs to be changed: get_user_nocheck, put_user_check and put_user_nocheck.

This patch really needs a better solution that doesn't involve calling copy_from_user or copy_to_user.

---

```c
    arch/avr32/include/asm/uaccess.h | 17 ++++++++++++++++ -
  1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/arch/avr32/include/asm/uaccess.h b/arch/avr32/include/asm/uaccess.h
index ed09239..99652f2 100644
--- a/arch/avr32/include/asm/uaccess.h
+++ b/arch/avr32/include/asm/uaccess.h
@@ -179,6 +179,13 @@ static inline __kernel_size_t __copy_from_user (void *to,
 extern int __get_user_bad (void);
 extern int __put_user_bad (void);

/* We need a simple way to test this flag in the following macros. */
#define __get_user_nocheck (x, ptr, size) \
({ 
    unsigned long __gu_val = 0; 
    int __gu_err = 0; 
    if (access_ok (VERIFY_READ, __gu_addr, size)) { 
        if (!AVR32_UNALIGNED && (unsigned long)__gu_addr % (size)) { 
            unsigned long count;
            count = copy_from_user (&__gu_val, __gu_addr, size);
            if (count == size) { 
                __gu_val >>= 8 * (4 - (size));
            } else { 
                __gu_err = -EFAULT;
            }
        } else { 
            __gu_err = -EFAULT;
        }
    } else if (access_ok (VERIFY_READ, __gu_addr, size)) { 
        switch (size) { 
        case 1: 
            __get_user_asm("ub", __gu_val, __gu_addr,
```
D.16. memcpys for `CONFIG_NOUNALIGNED`

```
---
1 arch/avr32/lib/Makefile | 3 +--
2 arch/avr32/lib/memcpy-nounaligned.S | 86 +++++++++++++++++++++++++++++++++++
3 create mode 100644 arch/avr32/lib/memcpy-nounaligned.S
4
5 diff --git a/arch/avr32/lib/memcpy-nounaligned.S b/arch/avr32/lib/memcpy-nounaligned.S
6 new file mode 100644
7 index 0000000..c10fcde
8 --- /dev/null
9 +++ a/arch/avr32/lib/memcpy-nounaligned.S
10 @@ -0,0 +1,86 @@
11 +/*
12 + * Copyright (C) 2004-2006 Atmel Corporation
13 + */
14 +
15 + */
16 + */
17 + */
18 + */
19 + */
20 +
21 +*/
22 +
23 +*/
24 +
25 +*/
26 +
27 +*/
28 +
29 +*/
30 +
31 +*/
32 +
33 +*/
34 +
35 +*/
36 +
37 +*/
38 +
39 +*/
40 +
41 +*/
42 +
43 +*/
44 +
45 +*/
46 +
47 +*/
48 +
49 +*/
50 +
51 +*/
52 +
53 +*/
54 +
55 +*/
56 +
57 +*/
58 +
59 +*/
60 +
61 +*/
62 +
63 +*/
64 +
65 +*/
66 +
67 +*/
68 +
69 +*/
70 +*/
71 +*/
72 +*/
73 +*/
74 +*/
75 +*/
76 +*/
77 +*/
78 +*/
79 +*/
80 +*/
81 +*/
```

D.16. memcpys for `CONFIG_NOUNALIGNED`

```
From 2809377f4f84e0f10e0f50a759211503824730b5 Mon Sep 17 00:00:00 2001
Message-ID: <2809377f4f84e0f10e0f50a759211503824730b5.Mon Sep 17 00:00:00 2001>
In-Reply-To: <cover@1242388773.git.rangoy@mnops.(none)>
References: <cover@1242388773.git.rangoy@mnops.(none)>
Date: Fri, 24 Apr 2009 15:28:14 +0200
Subject: [PATCH 16/29] AVR32: memcpy implementation for chips that cannot do unaligned memory accesses.

---
1 arch/avr32/lib/Makefile | 3 +--
2 arch/avr32/lib/memcpy-nounaligned.S | 86 +++++++++++++++++++++++++++++++++++
3 create mode 100644 arch/avr32/lib/memcpy-nounaligned.S
4
5 diff --git a/arch/avr32/lib/memcpy-nounaligned.S b/arch/avr32/lib/memcpy-nounaligned.S
6 new file mode 100644
7 index 0000000..c10fcde
8 --- /dev/null
9 +++ a/arch/avr32/lib/memcpy-nounaligned.S
10 @@ -0,0 +1,86 @@
11 +/*
12 + * Copyright (C) 2004-2006 Atmel Corporation
13 + */
14 +
15 + /*
16 + * void *memcpy(void *to, const void *from, unsigned long n)
17 + */
18 +
19 + /*
20 + * This implementation does word-aligned loads and stores if possible,
21 + * and falls back to byte-copy if not.
22 + */
23 +
24 + /*
25 + * Hopefully, in most cases, both "to" and "from" will be
26 + * word-aligned to begin with.
27 + */
28 +
29 + /*
30 +*/
31 +
32 + /*
33 +*/
34 +
35 + /*
36 +*/
37 +
38 + /*
39 +*/
40 +
41 + /*
42 +*/
43 +
44 + /*
45 +*/
46 +
47 + /*
48 +*/
49 +
50 + /*
51 +*/
52 +
53 + /*
54 +*/
55 +
56 + /*
57 +*/
58 +
59 + /*
60 +*/
61 +
62 + /*
63 +*/
64 +
65 + */
66 +
67 + */
68 +
69 + */
70 +*/
71 +*/
72 +*/
73 +*/
74 +*/
75 +*/
76 +*/
77 +*/
78 +*/
79 +*/
80 +*/
81 +*/
```
D.17 Mark AVR32B code with subarch flag

```c
+4: neg r10
+ reteq r9
+ /* Handle unaligned count */
+  lsl r10, 2
+  add pc, pc, r10
+  ld.ub r8, r11++
+  st.b r12++, r8
+  ld.ub r8, r11++
+  st.b r12++, r8
+  ld.ub r8, r11++
+  st.b r12++, r8
+  retal r9
+
+1: sub r10, 4
+  brlt 4b
+  add r10, r9
+  lsl r9, 2
+  add pc, pc, r9
+  ld.ub r8, r11++
+  st.b r12++, r8
+  ld.ub r8, r11++
+  st.b r12++, r8
+  ld.ub r8, r11++
+  st.b r12++, r8
+  rjmp 2b
+
+6: /* Impossible to align both "from" and "to" on a word boundary */
+  mov r9, r12
+  cp.w r10, 0
+  reteq r9
+  ld.ub r8, r11++
+  st.b r12++, r8
+  sub r10, 1
+  rjmp 7b
-
```

---

1.6.2.2

| From 79853dc97d2a55d4457b5ebcee832d68d05bada7 Mon Sep 17 00:00:00 2001
2 Message-Id: <7985 d3c97d2a55d4457b5ebcee832d68d05bada7.1242388774.git.rangoy@mnops.(none)>
3 In-Reply-To: <cover.1242388773.git.rangoy@mnops.(none)>
4 References: <cover.1242388773.git.rangoy@mnops.(none)>
5 From: =?utf-8?q?Gunnar =20 Rang =C3=87oy?= <gunnar@rangoy.com>
6 Date: Fri, 24 Apr 2009 15:42:09 +0200
7 Subject: [PATCH 17/29] AVR32: Mark AVR32B specific assumptions with CONFIG_SUBARCH_AVR32B in strnlen.

```
--- arch/avr32/lib/strnlen_user.S | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/avr32/lib/strnlen_user.S b/arch/avr32/lib/strnlen_user.S
index 65ce11a..482f9e7 100644
--- a/arch/avr32/lib/strnlen_user.S
+++ b/arch/avr32/lib/strnlen_user.S
@@ -18,10 +18,12 @@
 . type strnlen_user_user, "function"
 strnlen_user:
 .branch_if kernel r8, __strnlen_user
+#ifdef CONFIG_SUBARCH_AVR32B
 . sub r8, r11, 1
 . add r8, r12
 . retcs 0
+ brmi adjust_length /* do a closer inspection */
+#endif /* CONFIG_SUBARCH_AVR32B */
+ . global __strnlen_user
+.type __strnlen_user_user, "function"
@@ -39,6 +41,7 @@ __strnlen_user:
 . retal r12
```

D.18  mm-dma-coherent.c: ifdef AVRF32B code

```c
+/* No need to sync an uncached area */
+if (PXSEG (vaddr) == P2SEG)
+    return;
++} /* invalidate only */
```

D.19  Disable ret_if_privileged macro

```c
+/* No need to sync an uncached area */
+if (PXSEG(vaddr) == P2SEG)
+    return;
+```

```c
+/* No need to sync an uncached area */
+if (PXSEG(vaddr) == P2SEG)
+    return;
+```
D.20 AVR32A-support in Kconfig

```diff
+ config SUBARCH_AVR32A
+ bool
+ 
+ config SUBARCH_AVR32B
+ bool
+ select MMU
```

D.21 AVR32A address space support

```diff
+ #ifdef CONFIG_SUBARCH_AVR32A
+ #define PHYSADDR (a) ((unsigned long)(a))
+ */
```
# define P0SEG 0x00000000
@@ -38,6 +42,10 @@
# define P4SEGADDR(a) (((unsigned long)(a) & 0xffffffff) | P4SEG)

-# endif /* CONFIG_MMU */
+  # else
+    
+    +# error Unknown AVR32 subarch.
+  
+  # endif /* CONFIG_SUBARCH_* */
  
-# endif /* __ASM_AVR32_ADDRSPACE_H */
+  
+  
+  
+  +# ifdef CONFIG_SUBARCH_AVR32A
+    
+    static __inline__ unsigned long virt_to_phys(volatile void *address)
+    {
+      return (unsigned long)address;
+    }
+  
+  +# endif
+  
+  +# define cached_to_phys(addr) ((unsigned long)(addr))
+  +# define uncached_to_phys(addr) ((unsigned long)(addr))
+  +# define phys_to_cached(addr) ((void *)(addr))
+  +# define phys_to_uncached(addr) ((void *)(addr))
+  
+  
+  +# ifdef CONFIG_SUBARCH_AVR32B
+    
+    /* virt_to_phys will only work when address is in P1 or P2 */
+    static __inline__ unsigned long virt_to_phys(volatile void *address)
+    {
+      return (void *)address;
+    }
+  
+  +# else /* CONFIG_SUBARCH_* */
+    
+    +# error Unknown AVR32 subarch.
+  
+  +# endif /* CONFIG_SUBARCH_* */
  
  
  
  */ Generic IO read/write. These perform native-endian accesses. Note
  */
+  
+  #define __pa(x) PHYSADDR(x)
+  
+  +# ifdef CONFIG_SUBARCH_AVR32A
+    
+    #define __va(x) ((void *)(x))
+  +# elif CONFIG_SUBARCH_AVR32B
+    
+    #define __va(x) ((void *)(P1SEGADDR(x)))
+  +# else /* CONFIG_SUBARCH_* */
+    
+    +# error Unknown AVR32 subarch.
+  
+  +# endif /* CONFIG_SUBARCH_* */
  
  
  
  
  
  
  
  
  
  --

1.6.2.2
D.22 Change maximum task size for AVR32A

```c
+#ifdef CONFIG_SUBARCH_AVR32A
+  #define TASK_SIZE 0xffffffff
+#else
+  #define TASK_SIZE 0x80000000
+  #endif
```

D.23 Fix __range_ok for AVR32A in uaccess.h

```c
+  #ifdef CONFIG_SUBARCH_AVR32A
+  __range__ok(addr, size) 0
+  #endif
```

```c
+  #ifdef CONFIG_SUBARCH_AVR32B
+  __range__ok(addr, size) 0
+  #endif
```
D.24 Support for AVR32A entry-avr32a.S

---

D.24 Support for AVR32A entry-avr32a.S

From 1f99f4536db8830ab1817ad460627d1444e05d25 Mon Sep 17 00:00:00 2001
Message-ID: <1f99f4536db8830ab1817ad460627d1444e05d25..1242388773.git.rangoy@mnops. (none)>
In-Reply-To: <cover.1242388773.git.rangoy@mnops. (none)>
References: <cover.1242388773.git.rangoy@mnops. (none)>
From: =?utf-8?q?Gunnar =20 Rang =C3= B8y?= <gunnar@rangoy.com>
Date: Fri, 24 Apr 2009 15:03:39 +0200
Subject: [PATCH 24/29] AVR32: Support for AVR32A (entry-avr32a.c)

---

1. arch/avr32/kernel/Makefile | 1 +
2. arch/avr32/kernel/entry-avr32a.S | 705 ++++++++++++++++++++++++++++++++++++++
2 files changed, 706 insertions (+), 0 deletions ( -)
create mode 100644 arch/avr32/kernel/entry-avr32a.S

diff --git a/arch/avr32/kernel/Makefile b/arch/avr32/kernel/Makefile
index 18229d076edc210064a
--- a/arch/avr32/kernel/Makefile
+++ b/arch/avr32/kernel/Makefile
@@ -4,6 +4,7 @@
extra-y := head.o vmlinux.lds
obj-y += syscall_table.o syscall-stubs.o irq.o
obj-y += setup.o traps.o ocd.o ptrace.o

diff --git a/arch/avr32/kernel/entry-avr32a.S b/arch/avr32/kernel/entry-avr32a.S
new file mode 100644
index 00000000..2b97739
--- /dev/null
+++ b/arch/avr32/kernel/entry-avr32a.S
@@ -0,0 +1,705 @@

/*
 * Copyright (C) 2004-2006 Atmel Corporation
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 as
 * published by the Free Software Foundation.
 * Published by the Free Software Foundation.
 */

#include <linux/errno.h>
#include <asm/asm.h>
#include <asm/hardirq.h>
#include <asm/irq.h>
#include <asm/ocd.h>
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/ptrace.h>
#include <asm/sysreg.h>
#include <asm/thread_info.h>
#include <asm/unistd.h>

.section .ex.text,"ax",@progbits
.align 2

.exception_vectors:

+ bral handle_critical /* (0x00) Unrecoverable exception, Internal */
+ bral handle_critical /* (0x04) TLB Multiple hit, Internal Signal */
+ bral do_bus_error_write /* (0x08) Bus error data fetch, Data bus */
+ bral do_bus_error_read /* (0x0c) Bus error instruction fetch, Data bus */
+ bral do_mmi1l /* (0x10) NMI (Non Maskable Interrupt), External input */
+ bral handle_address_fault /* (0x14) Instruction address, ITLB */
bral handle Protection fault /* (0x18) ITLB Protection, ITLB */
+ .align 2
bral handle debug /* (0x1c) Breakpoint, OCD system */
+ .align 2
bral do illegal opcode ll /* (0x20) Illegal opcode, Instruction */
+ .align 2
bral do illegal opcode ll /* (0x24) Unimplemented instruction, Instruction */
+ .align 2
bral do illegal opcode ll /* (0x28) Privilege violation, Instruction */
+ .align 2
bral do fpe ll /* (0x2c) Floating-point, FP Hardware */
+ .align 2
bral do illegal opcode ll /* (0x30) Coprocessor absent, Instruction */
+ .align 2
bral handle address fault /* (0x34) Data address (Read), DTLB */
+ .align 2
bral handle address fault /* (0x38) Data address (Write), DTLB */
+ .align 2
bral handle Protection fault /* (0x3c) DTLB Protection (Read), DTLB */
+ .align 2
bral handle Protection fault /* (0x40) DTLB Protection (Write), DTLB */
+ .align 2
bral do dtlb modified /* (0x44) DTLB Modified, DTLB */
+ .org 0x50 /* (0x50) ITLB Miss, ITLB */
+ .global itlb miss
itlb miss :
rjmp tlb miss common
+ .org 0x60 /* (0x60) DTLB Miss (Read), DTLB */
dtlb miss read :
rjmp tlb miss common
+ .org 0x70 /* (0x70) DTLB Miss (Write), DTLB */
dtlb miss write :
+ .global tlb miss common
+ .align 2
+ tlb miss common :
+ /* this should never be called... */
+ sub r12, pc, (. - 1f)
+ bral panic
+ .align 2
+ /* --- System Call --- */
+ .org 0x100 /* (0x100) Supervisor call, Instruction */
system call :
+ stmts --sp , r0 -lr
+ pushm r12 /* r12_orig */
+ zero_fp /* [remove comment (RC)] sets frame pointer[R7] to zero to ensure that the frame
pointer, so that the backtrace does not follow a context switch */
+ /* Store the return value so that the correct value is loaded below */
+ stdsp sp[ REG R12 ], r12
+ /* check for syscall tracing */
+ get_thread_info r0
+ ld.w r1, r0[TI_ flags] /* RC: load TI_ flags to r1 */
+ bld r1, TIF_SYSCALL_TRACE /* RC: Set carry flag if TIF_SYSCALL_TRACE is set in thread_info */
+ */
+ brcs syscall_trace enter /* RC: branch if */
+ /*syscall_trace_cont: */
+ cp.v r8, NR_syscalls
+ brhs syscall_bachs /* RC: branch if system call is out of range */
+ ldpc lr, syscall_table_addr /* set lr to syscall base address */
+ ld.v lr, lr[r8 << 2] /* RC: fetch the syscall address from syscall address based on the
syscall number (R8) to lr */
+ mov r8, r5 /* 5th argument (6th is pushed by stub) */
+ icall lr /* call syscall handling*/
+ /*syscall_return: */
+ get_thread_info r0
+ mask interrupts /* make sure we don’t miss an interrupt
setting need_resched or sigpending
between sampling and the rets */
+ */
+ stdsp sp[R9, R12], r12
+ /* --- System Call --- */
+ .org 0x100 /* (0x100) Supervisor call, Instruction */
system call :
+ stmts --sp , r0 -lr
+ pushm r12 /* r12_orig */
+ zero_fp /* [remove comment (RC)] sets frame pointer[R7] to zero to ensure that the frame
pointer, so that the backtrace does not follow a context switch */
+ /* Store the return value so that the correct value is loaded below */
+ stdsp sp[ REG R12 ], r12
+ /* --- System Call --- */
+ .org 0x100 /* (0x100) Supervisor call, Instruction */
system call :
+ stmts --sp , r0 -lr
+ pushm r12 /* r12_orig */
+ zero_fp /* [remove comment (RC)] sets frame pointer[R7] to zero to ensure that the frame
pointer, so that the backtrace does not follow a context switch */
+ /* Store the return value so that the correct value is loaded below */
+ stdsp sp[R9, R12], r12
+ /* --- System Call --- */
+ .org 0x100 /* (0x100) Supervisor call, Instruction */
system call :
+ stmts --sp , r0 -lr
+ pushm r12 /* r12_orig */
+ zero_fp /* [remove comment (RC)] sets frame pointer[R7] to zero to ensure that the frame
pointer, so that the backtrace does not follow a context switch */
+ /* Store the return value so that the correct value is loaded below */
+ stdsp sp[R9, R12], r12
D.24. SUPPORT FOR AVR32A ENTRY-AVR32A.S

155 + 1d.w r1, r0[TI_flags]
156 + andl r1, _TIF_ALLWORK_MASK, COH
157 + bne syscall_exit_work /* RC: branch if work has to be done */
158 +
159 + syscall_exit_cont:
160 + sub sp, -4 /* r12_orig */
161 + ldts sp++, r0-1r /* restoring registers */
162 + rets
163 +
164 +   .align 2
165 + syscall_table_addr:
166 +   .long sys_call_table
167 +
168 + syscall_badsys: /* RC: comefrom: syscall_trace_cont */
169 + mov r12, -ENOSYS
170 + rjmp syscall_return /* RC: return -ENOSYS */
171 +
172 + syscall_trace_enter:
173 + pusha r8-r12
174 + rcall syscall_trace
175 + popa r8-r12
176 + rjmp syscall_trace_cont
177 +
178 +   .global ret_from_fork
179 + ret_from_fork: /* RC: a newborn child starts it’s exciting new thread here */
180 + rcall schedule_tail
181 +
182 +   /* check for syscall tracing */
183 + get_thread_info r0
184 + ld.w r1, r0[TI_flags]
185 + andl r1, _TIF_ALLWORK_MASK, COH
186 + breq syscall_exit_cont
187 + /*
188 + * fall through to syscall_exit_work since one or more of the
189 + * bits in TIF_ALLWORK_MASK was set.
190 + */
191 +
192 + syscall_exit_work:
193 + bl d r1, TIF_SYSCALL_TRACE
194 + brcc syscall_exit_work_loop
195 + unmask_interrupts
196 + rcall syscall_exit_work
197 + mask_interrupts
198 + ld.w r1, r0[TI_flags]
199 +
200 + /*
201 + * This loop will run until no work-flags are set in the
202 + * thread info.
203 + */
204 + syscall_exit_work_loop:
205 + bl d r1, TIF_NEED_RESCHED
206 + brcc syscall_exit_work_nosched
207 + unmask_interrupts
208 + rcall schedule
209 + mask_interrupts
210 + ld.w r1, r0[TI_flags]
211 + rjmp syscall_exit_work_loop
212 +
213 + syscall_exit_work_nosched:
214 + mov r2, _TIF_SUSPENDING | _TIF_RESTORE_SIGMASK
215 + tst r1, r2
216 + breq syscall_exit_work_nosigs
217 + unmask_interrupts
218 + mov r12, sp
219 + mov r11, r0
220 + rcall do_notify_resume
221 + mask_interrupts
222 + ld.w r1, r0[TI_flags]
223 + rjmp syscall_exit_work_loop
224 +
225 + syscall_exit_work_nosigs:
226 + bl d r1, TIF_BREAKPOINT
227 + brcc syscall_exit_cont
228 + rjmp enter_monitor_node
229 +
230 +
231 + .type save_full_context_ex, @function
232 +   .align 2
233 + save_full_context_ex:
234 + /*
235 + * check whether the return address of the exception is the
236 + * debug_trampoline, since that would need special handling.
237 + */
238 + lddsp r11, sp[REG_PC]
sub r9 , pc , . - debug_trampoline
+ cp.w r9 , r11
+ breq save_full_context_dbg_trampoline
+ /* Check for kernel-mode. */
+ lddsp r8 , sp[REG_SR]
+ mov r12 , r8
+ andh r8 , (MODE_MASK >> 16) , CON
+ brne save_full_context_kernel_mode
+ /* Check for kernel-mode. */
+ lddsp r8 , sp[REG_SR]
+ mov r12 , r8
+ andh r8 , (MODE_MASK >> 16) , CON
+ brne save_full_context_kernel_mode
+ /* The debug handler set up a trampoline to make us
+ automatically enter monitor mode upon return, but since
+ we're saving the full context, we must assume that the
+ exception handler might want to alter the return address
+ and/or status register. So we need to restore the original
+ context and enter monitor mode manually after the exception
+ has been handled.
+/*
+ * Low-level exception handlers */
+ handle_critical:
+ pushm r0 - r12
+ sub sp , 12 /* lr, sp, r12_orig */
+ mfср r12 , SYSREG_ECR
+ mov r11 , sp
+ rcall do_critical_exception
+ /* We should never get here... */
+ sub r12 , pc , (. - 1f)
+ bral panic
+ .align 2
+ .asciz "Return from critical exception!"
+ .align 1
+ do_bus_error_write:
+ stmts --sp , r0-1r
+ sub sp , 4 /* skip r12_orig */
+ rcall save_full_context_ex
+ mov r11 , 1
+ rjmp do_bus_error_common
+ .align 1
+ do_bus_error_read:
+ stmts --sp , r0-1r
+ sub sp , 4 /* skip r12_orig */
+ rcall save_full_context_ex
+ mov r11 , 0
+ .align 1
+ do_bus_error_common:
+ mfср r12 , SYSREG_BEAR
+ mov r10 , sp
+ rcall do_bus_error
+ rjmp ret_from_exception
+ .align 1
+ do_nmi_ll:
+ stmts --sp , r0-1r
+ sub sp , 4 /* skip r12_orig */
+ /* Check for kernel-mode. */
+ lddsp r8 , sp[REG_SR]
+ bfextu r0 , r9 , MODE_SHIFT , 3
+ brne do_nmi_ll_kernel_fixup
+ .align 1
+ do_nmi_ll_cont:
D.24. SUPPORT FOR AVR32A ENTRY-AVR32A.S

321+ mfsr r12, SYSREG_ECR
322+ mov r11, sp
323+ rcall do_nmi
324+ tst r0, r0
325+ brne do_nmi_ll_kernel_exit
326+ sub sp, -4 /* skip r12_orig */
327+ ldmts sp++, r0-1r
328+ rete
329+ /* Kernel mode save */
330+ do_nmi_ll_kernel_fixup:
331+ sub r10, sp, -FRAME_SIZE_FULL
332+ stdsp sp[REG_SP], r10 /* replace saved SP */
333+ rjmp do_nmi_ll_cont
334+ /* Kernel mode restore */
335+ do_nmi_ll_kernel_exit:
336+ sub sp, -4 /* skip r12_orig */
337+ popm lr
338+ sub sp, -4 /* skip sp */
339+ popm r0-r12
340+ rete
341+ /* Common code for returning from an exception handler. */
342+ ret_from_exception:
343+ mask_interrupts
344+ lddsp r4, sp[REG_SR]
345+ andh r4, (MODE_MASK >> 16), CON
346+ brne fault_resume_kernel
347+ get_thread_info r0
348+ ld.w r1, r0[TI_flags]


```c
andl r1, TIF_WORK_MASK, COH
brne fault_exit_work

* fault_resume_user:
  mask_exceptions
  sub sp, -4 /* skip r12_orig */
  ldtsp sp++, r0-1r
rete

* fault_resume_kernel:
#ifdef CONFIG_PREEMPT
  /* Check whether we should preempt this kernel thread. */
  get_thread_info r0
  ld.w r2, r0[TI_preempt_count]
  cp.w r2, 0
  brne fault_resume_kernel_no_schedule
  ld.w r1, r0[TI_flags]
  bld r1, TIF_NEED_RESCHED
  bcc fault_resume_kernel_no_schedule
  lddsp r4, sp[REG_SR]
  bld r4, SYSREG GM_OFFSET
  brcc fault_resume_kernel_no_schedule
  rcall preempt_schedule_irq
#endif

* fault_resume_kernel_no_schedule:
#endif

* fault_exit_work:
  bld r1, TIF_NEED_RESCHED
  brcc fault_exit_work_no_resched
  unmask_interruptions
  rcall schedule
  mask_interruptions
  ld.w r1, r0[TI_flags]
  rjmp fault_exit_work

* fault_exit_work_no_resched:
  mov r2, TIF_SIGPENDING | TIF_RESTORE_SIGMASK
  tst r1, r2
  breq fault_exit_work_no_sigwork
  unmask_interruptions
  mov r12, sp
  mov r11, r0
  rcall do_notify_resume
  mask_interruptions
  ld.w r1, r0[TI_flags]
  rjmp fault_exit_work

* fault_exit_work_no_sigwork:
  bld r1, TIF_BREAKPOINT
  brcc fault_resume_user
  rjmp enter_monitor_mode

.handle_debug:
.section .probes.text, "ax", @progbits
.type handle_debug, @function
handle_debug:
  sub sp, 8 /* Make room for REG_PC and REG_SR */
  sub sp, 4 /* skip r12_orig */
  afsr r8, SYSREG_RAR_DBG
  stdsp sp[REG_PC], r8
  afsr r9, SYSREG_RAR_DBG
  stdsp sp[REG_SR], r9
  unmask_exceptions
  bfextu r9, r9, SYSREG_MODE_OFFSET, SYSREG_MODE_SIZE
  brne debug_fixup_rege
#endif

.ifdef CONFIG_TRACE_IRQFLAGS
rcall trace_hardirqs_off
#endif
mov r12, sp
```
491. rcall do_debug
492. mov sp, r12
493. lddsp r2, sp[REG_SR]
494. bfextu r3, r2, SYSREG_MODE_OFFSET, SYSREG_MODE_SIZE
495. brne debug_resume_kernel
496. get_thread_info r0
497. ld.w r1, r0[TI_flags]
498. mov r2, _TIFDBGWORK_MASK
499. tst r1, r2
500. brne debug_exit_work
501. bld r1, TIF_SINGLE_STEP
502. brcc 1f
503. mfdr r4, OCD_DC
504. sbr r4, OCD_DC_SS_BIT
505. mtdr OCD_DC, r4
506. mask_exceptions
507. ifdef CONFIG_TRACE_IRQFLAGS
508. rcall trace_hardirqs_on
509. 1:
510. endif
511. sub sp, -4
512. ldmts sp++, r0-1r
513. retd
514. .size handle_debug, . - handle_debug
515. /* Mode of the trapped context is in r9 */
516. type debug_fixup_regs, @function
517. debug_fixup_regs:
518. sub r8, sp, -FRAME_SIZE_FULL
519. stdsp sp[REG_SP], r8
520. rjmp .debug_fixup_cont
521. .size debug_fixup_regs, . - debug_fixup_regs
522. type debug_resume_kernel, @function
523. debug_resume_kernel:
524. mask_exceptions
525. ifdef CONFIG_TRACE_IRQFLAGS
526. bld r11, SYSREG_GM_OFFSET
527. brcc 1f
528. rcall trace_hardirqs_on
529. 1:
530. endif
531. mfsr r2, SYSREG_SR
532. mov r1, r2
533. bfins r2, r3, SYSREG_MODE_OFFSET, SYSREG_MODE_SIZE
534. mtsr SYSREG_SR, r2
535. sub pc, -2
536. mtsr SYSREG_SR, r1
537. sub pc, -2 /* flush pipeline */
538. sub sp, -4 /* Skip r12_orig */
539. popm 1r
540. sub sp, -4 /* Skip SP */
541. popm r0-r12
542. retd
543. .size debug_resume_kernel, . - debug_resume_kernel
544. type debug_exit_work, @function
545. /*end of fixups after reg change */
546. debug_exit_work:
547. /*
548. * We must return from Monitor Mode using a retd, and we must
549. * not schedule since that involves the D bit in SR getting
550. * cleared by something other than the debug hardware. This
551. * may cause undefined behaviour according to the Architecture
553. *
554. * So we fix up the return address and status and return to a
555. * sub below in Exception mode. From there, we can follow the
556. * normal exception return path.
557. *
558. * The real return address and status registers are stored on
559. * the stack in the way the exception return path understands,
560. * so no need to fix anything up there.
561. */
562. sub r8, pc, - fault_exit_work
563. st.w sp[REG_PC], r8
564. mov r8, 0
565. orh r9, hi(SR_EM | SR_GM | MODE_EXCEPTION)
+ st.w sp[REG_SR], r9
+ sub pc, -2
+ retd
+ .size debug_exit_work, . - debug_exit_work
+ .macro IRQ_LEVEL level
+ .type irq_level\level, @function
+ .irq_level\level:
+ /* Stack:
+ * sp+0 SR
+ * sp+4 PC
+ * sp+8 LR
+ * sp+12 R12
+ * sp+16 R11
+ * sp+20 R10
+ * sp+24 R9
+ * sp+28 R8
+ */
+ stmns --sp,r0-lr
+ sub sp, 4 /* skip r12_orig */
+ lddsp r8, sp[REG_PC]
+ lddsp r9, sp[REG_SR]
+ mov r11, sp
+ mov r12, \level
+ rcall do_IRQ
+ lddsp r4, sp[REG_SR]
+ bextu r4, r4, SYSREG_M0_OFFSET, 3
+ cp.w r4, MODE_SUPERVISOR >> SYSREG_M0_OFFSET
+ breq 2f
+ cp.w r4, MODE_USER >> SYSREG_M0_OFFSET
+ ifndef CONFIG_PREEMPT
+ brne 3f
+ #else
+ brne 1f
+ #endif
+ /* Interrupt was entered from user-mode. */
+ get_thread_info r0
+ ld.w r1, r0[TI_flags]
+ mov r2, r1
+ andl r2, TIF_WORK_MASK, COH
+ brne fault_exit_work
+ /* Exit interrupt handling. */
+1:
+ ifndef CONFIG_TRACE_IRQFLAGS
+ rcall trace_bardirqs_on
+ else
+ sub sp, -4 /* ignore r12_orig */
+ ldmts sp++,r0-lr
+ ret
+ endif
+ ifndef CONFIG_PREEMPT
+ brcc 3f
+ #else
+ brcc 1b
+ endif
+ /* Interrupt was entered from supervisor mode. We need to check
+ * that this didn't happen while the processor was going to
+ * sleep. The power-manager will set the CPU_GOING_TO_SLEEP flag
+ * when entering sleep mode. We test that flag, and if it is
+ * set, we change the return address of the interrupt to the
+ * instruction following the sleep-instruction.
+ */
+2: get_thread_info r0
+ ld.w r1, r0[TI_flags]
+ bld r1, TIF_CPU_GOING_TO_SLEEP
+ ifndef CONFIG_PREEMPT
+ brcc 3f
+ #else
+ brcc 1b
+ endif
+ /* Update the return address so that the sleep-instruction
+ * isn't executed.
+ */
+ sub r1, pc, - cpu_idle_skip_sleep
+ stdsp sp[REG_PC], r1
+ ifndef CONFIG_PREEMPT
+ /*
+ * When interrupts are entered from kernel mode, and preemption
+ */
+ ifndef CONFIG_TRACE_IRQFLAGS
+ rcall trace_bardirqs_on
+ else
+ sub sp, -4 /* ignore r12_orig */
+ ldmts sp++,r0-lr
+ ret
+ endif
+ ifndef CONFIG_PREEMPT
+ brcc 3f
+ #else
+ brcc 1b
+ endif
+ /* Interrupt was entered from supervisor mode. We need to check
+ * that this didn't happen while the processor was going to
+ * sleep. The power-manager will set the CPU_GOING_TO_SLEEP flag
+ * when entering sleep mode. We test that flag, and if it is
+ * set, we change the return address of the interrupt to the
+ * instruction following the sleep-instruction.
+ */
+2: get_thread_info r0
+ ld.w r1, r0[TI_flags]
+ bld r1, TIF_CPU_GOING_TO_SLEEP
+ ifndef CONFIG_PREEMPT
+ brcc 3f
+ #else
+ brcc 1b
+ endif
+ /* Update the return address so that the sleep-instruction
+ * isn't executed.
+ */
+ sub r1, pc, - cpu_idle_skip_sleep
+ stdsp sp[REG_PC], r1
+ ifndef CONFIG_PREEMPT
+ /*
+ * When interrupts are entered from kernel mode, and preemption
+ */
* is enabled, we need to check whether we should schedule after
executing the interrupt. This is done in this block of code.

```
+3: get_thread_info r0
  ld.w r2, r0[TI_preempt_count]
  cp.w r2, 0
  brne 1b
  ld.w r1, r0[TI_flags]
  blid r1, TIF_NEED_RESCHED
  brcc 1b
  lddsp r4, sp[REG_SR]
  brcc 1b
  rcall preempt_schedule_irq
```

+ */

```c
+ /*
+ * We need to enter monitor mode to do a single step. The
+ * monitor code will alter the return address so that we
+ * return directly to the user instead of returning here.
+ */
+ breakpoint
+ rjmp breakpoint_failed
```
D.25 Change HIMEM_START for AVR32A

From 881604261316b978975207e22d919a07924e0927 Mon Sep 17 00:00:00 2001
Message-Id: <881604261316 b978975207e22d919a07924e0927.Mon Sep 17 00:00:00 2001
References: <cover.1242388773.git rangoy@mnops.(none)>
From: =?utf-8?q?Gunnar Rang? <gunnar@rangoy.com>
Date: Mon, 16 Apr 2009 15:21:38 +0200
Subject: [PATCH 25/29] AVR32: Change HIGHMEM_START for AVR32A.

---
arch/avr32/include/asm/page.h | 6 ++++++---
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/arch/avr32/include/asm/page.h b/arch/avr32/include/asm/page.h
index ca36368..b69e6c1 100644
--- a/arch/avr32/include/asm/page.h
+++ b/arch/avr32/include/asm/page.h
@@ -106,6 +106,12 @@ static inline int get_order(unsigned long size)
 0xffffffffUL
 #elif CONFIG_SUBARCH_AVR32B
 # define HIGHMEM_START 0x20000000UL
+ # else /* CONFIG_SUBARCH_* */
+ # error Unknown AVR32 subarch.
+ # endif /* CONFIG_SUBARCH_* */

 #endif /* __ASM_AVR32_PAGE_H */
--

---

1.6.2.2

D.26 New pt_regs layout for AVR32A

From 814 e2ae4fa15281d76b1ee21e1f565e1880a9698 Mon Sep 17 00:00:00 2001
Message-Id: <814 e2ae4fa15281d76b1ee21e1f565e1880a9698.Mon Sep 17 00:00:00 2001
References: <cover.1242388773.git rangoy@mnops.(none)>
From: =?utf-8?q?Gunnar Rang? <gunnar@rangoy.com>
Date: Mon, 27 Apr 2009 14:49:13 +0200
Subject: [PATCH 26/29] AVR32: New pt_regs layout for AVR32A.

---
arch/avr32/include/asm/ptrace.h | 79 +++++++++++++++++++++++++++++++++++++++
1 files changed, 79 insertions(+), 0 deletions(-)
diff --git a/arch/avr32/include/asm/ptrace.h b/arch/avr32/include/asm/ptrace.h
index 9e2d44f..043b873 100644
--- a/arch/avr32/include/asm/ptrace.h
+++ b/arch/avr32/include/asm/ptrace.h
@@ -61,6 +61,41 @@
# define SR_Z_BIT 1
 # define SR_C_BIT 0
  # define REG_R12.ORIG 0
+ # define __AVR32_AVR32A__
+ # define REG_R12.ORIG 0
+ # define REG_LB 4
+ # define REG_SP 8
+ # define REG_R12 12
+ # define REG_R11 16
+ # define REG_R10 20
+ # define REG_R9 24
+ # define REG_R8 28
+ # define REG_R7 32
+ # define REG_R6 36
+ # define REG_R5 40
+ # define REG_R4 44
+ */
+ # define REG_R12.ORIG 0
+ # define REG_LB 4
+ # define REG_SP 8
+ # define REG_R12 12
+ # define REG_R11 16
+ # define REG_R10 20
+ # define REG_R9 24
+ # define REG_R8 28
+ # define REG_R7 32
+ # define REG_R6 36
+ # define REG_R5 40
+ # define REG_R4 44
+ */
+ # define REG_R12.ORIG 0
+ # define REG_LB 4
+ # define REG_SP 8
+ # define REG_R12 12
+ # define REG_R11 16
+ # define REG_R10 20
+ # define REG_R9 24
+ # define REG_R8 28
+ # define REG_R7 32
+ # define REG_R6 36
+ # define REG_R5 40
+ # define REG_R4 44
+ */
+ # define REG_R12.ORIG 0
+ # define REG_LB 4
+ # define REG_SP 8
+ # define REG_R12 12
+ # define REG_R11 16
D.27. UC3A0512ES INTERRUPT BUG WORKAROUND

41+  #define REG_R3 48
42+  #define REG_R2 52
43+  #define REG_R1 56
44+  #define REG_R0 60
45+  
46+  #define REG_SR 64
47+  #define REG_PC 68
48+  
49+  #define FRAME_SIZE_MIN 8
50+  #define FRAME_SIZE_FULL 72
51+  
52+  #elif __AVR32_AVR32B__
53+  
54+  /* The order is defined by the sfrs instruction. r0 is stored first, 
55+   * so it gets the highest address.
56+  @-99.7 128.45 @
57+  
58+  #define REG_PC 4
59+  #define REG_SR 0
60+  
61+  */
62+  
63+  #elif __AVR32_AVR32__
64+  
65+  #error Unknown AVR32 subarch.
66+  
67+  */
68+  
69+  #ifdef __ASSEMBLY__
70+  
71+  #ifndef __ASSEMBLY__
72+  
73+  struct pt_regs {
74+    /* Only saved on system call, and is used to restart system calls. */
75+    unsigned long r12_orig;
76+    /* Always saved, but some might be optimized away */
77+    unsigned long lr;
78+    unsigned long sp;
79+    unsigned long r12;
80+    unsigned long r11;
81+    unsigned long r10;
82+    unsigned long r9;
83+    unsigned long r8;
84+    unsigned long r7;
85+    unsigned long r6;
86+    unsigned long r5;
87+    unsigned long r4;
88+    unsigned long r3;
89+    unsigned long r2;
90+    unsigned long r1;
91+    unsigned long r0;
92+    
93+    /* These are automatically saved when an interrupt or exception occurs */
94+    unsigned long sr;
95+    unsigned long pc;
96+  }
97+  
98+  #elif __AVR32_AVR32B__
99+  
100+  struct pt_regs {
101+    /* These are always saved */
102+    unsigned long sr;
103+    unsigned long pc;
104+    @-120.6 193.12 @
105+  }
106+  
107+  #else /* __AVR32_AVR32__ */
108+  
109+  #error Unknown AVR32 subarch.
110+  
111+  */
112+  
113+  #ifndef __KERNEL__
114+  
115+  #include <asm/ocd.h>
116+  
117+  */
118+  
119+  i.6.2.2

D.27 UC3A0512ES interrupt bug workaround
D.28  UC3A0xxx support

---

    diff --git a/arch/arm32/include/asm/asm.h b/arch/arm32/include/asm/asm.h
    index 1bad0c5..20f737b 100644
    --- a/arch/arm32/include/asm/asm.h
    +++ b/arch/arm32/include/asm/asm.h
    @@ -12,8 +12,30 @@
    #define mask_interrupts ssrf SYSREG_GM_OFFSET
    #define mask_exceptions ssrf SYSREG_EM_OFFSET
    
    .macro mask_interrupts
    ssrf SYSREG_GM_OFFSET
    .ifdef CONFIG_CPU_AT32UC3A0XXX
    /*
     * Workaround for errata 41.4.5.5:
     * "Need two NOPs instruction after instructions masking interrupts"
    */
    nop
    nop
    .endif
    .endm
    
    .macro mask_exceptions
    ssrf SYSREG_EM_OFFSET
    .ifdef CONFIG_CPU_AT32UC3A0XXX
    /*
     * Workaround for errata 41.4.5.5:
     * "Need two NOPs instruction after instructions masking interrupts"
    */
    nop
    nop
    .endif
    .endm
    
    #define unmask_interrupts csrf SYSREG_GM_OFFSET
    #define unmask_exceptions csrf SYSREG_EM_OFFSET

---

    diff --git a/arch/arm32/include/asm/irqflags.h b/arch/arm32/include/asm/irqflags.h
    index 93570da..e25fc64 100644
    --- a/arch/arm32/include/asm/irqflags.h
    +++ b/arch/arm32/include/asm/irqflags.h
    @@ -33,15 +33,15 @@ static inline void raw_local_irq_restore(unsigned long flags)
    static inline void raw_local_irq_disable(void)
    {
      
    #ifdef CONFIG_CPU_AT32UC3A0XXX
    /*
     * Workaround for errata 41.4.5.5:
     * "Need two NOPs instruction after instructions masking interrupts"
    */
    asm volatile("ssrf %0; nop; nop" : "n"(SYSREG_GM_OFFSET) : "memory");
    
    #endif
    
    static inline void raw_local_irq_enable(void)
    --

    1.6.2.2
diff --git a/arch/avr32/Kconfig b/arch/avr32/Kconfig
index e366533..631d388 100644
--- a/arch/avr32/Kconfig
+++ b/arch/avr32/Kconfig
@@ -107,6 +107,13 @@ config PLATFORM_AT32AP
    select AVR32_CACHE
    select AVR32_UNALIGNED
+
+    config PLATFORM_AT32UC3A
+    bool
+    select SUBARCH_AVR32A
+    select PERFORMANCE_COUNTERS
+    select ARCH_REQUIRE_GPIOLIB
+    select GENERIC_ALLOCATOR
+
+
# CPU types
@@ -125,6 +132,11 @@ config CPU_AT32AP700X
    bool
    select CPU_AT32UC3A0XXX
+
+    # UC3A0
+    config CPU_AT32UC3A0XXX
+    bool
+    select PLATFORM_AT32UC
+
choice
    prompt "AVR32 board type"
    default BOARD_ATSTK1000
@@ -158,18 +170,22 @@ config LOADER_U_BOOT
endchoice
+
    source "arch/avr32/mach-at32ap/Kconfig"
+    source "arch/avr32/mach-at32uc3a/Kconfig"
    source "kernel/Kconfig.preempt"

config LOAD_ADDRESS
    hex
    default 0x10000000 if LOADER_U_BOOT=y && CPU_AT32AP700X=y
+
+    default 0xc8000000 if LOADER_U_BOOT=y && CPU_AT32UC3A0XXX=y

config ENTRY_ADDRESS
    hex
    default 0x90000000 if LOADER_U_BOOT=y && CPU_AT32AP700X=y
+
+    default 0xc8000000 if LOADER_U_BOOT=y && CPU_AT32UC3A0XXX=y

config PHYS_OFFSET
    hex
    default 0x10000000 if CPU_AT32AP700X=y
+
+    default 0xc8000000 if CPU_AT32UC3A0XXX=y

+    source "kernel/Kconfig.preempt"

diff --git a/arch/avr32/Makefile b/arch/avr32/Makefile
index 4864 cb1..ad1dd87 100644
--- a/arch/avr32/Makefile
+++ b/arch/avr32/Makefile
@@ -31,6 +31,7 @@ CFLAGS_MODULE += -mno - relax
 LDFLAGS_vmlinux += -- relax
 
 cpuflags -$( CONFIG_PLATFORM_AT32AP ) += -march =ap
+    cpuflags -$( CONFIG_PLATFORM_AT32UC3A ) += -march=uc1

 KBUILD_CFLAGS += $(cpuflags -y)
 KBUILD_AFLAGS += $(cpuflags -y)
@@ -31,6 +39,7 @@ KBUILD_AFLAGS += $(cpuflags -y)
 CHECKFLAGS += -D__avr32__ -D__BIG_ENDIAN

 machine -$(CONFIG_PLATFORM_AT32AP) := at32ap
+machine -$(CONFIG_PLATFORM_AT32UC3A) := at32uc3a

machdirs := $(patsubst %,arch/avr32/mach-%/,$(machdirs))

 KBUILD_CPPFLAGS += $(patsubst %,-%,$(srctree)/%include,$(machdirs))
 diff --git a/arch/avr32/kernel/cpu.c b/arch/avr32/kernel/cpu.c
index e84fa1..905a920 100644
--- a/arch/avr32/kernel/cpu.c
+++ b/arch/avr32/kernel/cpu.c
@@ -208,6 +208,7 @@ struct chip_id_map {
 static const struct chip_id_map chip_names[] = {
     { .mid = 0x1f, .pn = 0x1edc, .name = "AT32AP700x" },
+    { .mid = 0x1f, .pn = 0x1e82, .name = "AT32UC3A0xxx" },


APPENDIX D. LINUX KERNEL PATCHES

84 
85 #define NR_CHIP_NAMES ARRAY_SIZE(chip_names)
86
87
diff --git a/arch/avr32/mach-at32ap/Kconfig b/arch/avr32/mach-at32uc3a/Kconfig
88 similarity index 62%
89 copy from arch/avr32/mach-at32ap/Kconfig
90 copy to arch/avr32/mach-at32uc3a/Kconfig
91 index a7bccc8..dea8d93 100644
92 --- a/arch/avr32/mach-at32ap/Kconfig
93 +++ b/arch/avr32/mach-at32uc3a/Kconfig
94 @@ -1,13 +1,13 @@
95 
96 # menu "Atmel AVR32 AP options"
97 # menu "Atmel AVR32 UC3A options"
98
99 choice
100      prompt "AT32AP700x static memory bus width"
101      depends on CPU_AT32AP700X
102      default AP700X_16_BIT_SMC
103      help
104      Define the width of the AP7000 external static memory interface.
105      This is used to determine how to mangle the address and/or data
106      when doing little-endian port access.
107
108 #endif # PLATFORM_AT32AP
109 endif # PLATFORM_AT32UC3A
110
111 diff --git a/arch/avr32/mach-at32uc3a/Makefile b/arch/avr32/mach-at32uc3a/Makefile
112 new file mode 100644
113 index 0000000..0bf3edc
114 --- /dev/null
115 +++ b/arch/avr32/mach-at32uc3a/Makefile
116 @@ -0,0 +1,9 @@
117 +obj -y += pdca.o clock.o intc.o extint.o gpio.o hsmc.o
118 +obj- += hmatrix.o
119 +obj- += $(CONFIG_CPU_AT32UC3A0XXX) += at32uc3a0xxx.o pm-at32uc3a0xxx.o
120 +obj- += $(CONFIG_CPU_FREQ_AT32UC3A0) += cpufreq.o
121 +obj- += $(CONFIG_PM) += pm.o
122 +
123 +ifeq ($(CONFIG_PM_DEBUG),y)
124 +CFLAGS_pm.o += -DDEBUG
125 +endif
126 diff --git a/arch/avr32/mach-at32uc3a/at32uc3a.c b/arch/avr32/mach-at32uc3a/at32uc3a.c
127 new file mode 100644
128 index 0000000..f7610f1
129 --- /dev/null
130 +++ b/arch/avr32/mach-at32uc3a/at32uc3a.c
131 @@ -0,0 +1,1453 @@
132 +/* Copyright (C) 2005-2006 Atmel Corporation
133 + * This program is free software; you can redistribute it and/or modify
134 + * it under the terms of the GNU General Public License version 2 as
135 + * published by the Free Software Foundation.
136 + */
137 +
138 +#include <linux/clk.h>
139 +#include <linux/delay.h>
140 +#include <linux/dev_msc.h>
141 +#include <linux/fb.h>
142 +#include <linux/init.h>
# include <linux/platform_device.h>
# include <linux/dma-mapping.h>
# include <linux/gpio.h>
# include <linux/spi/spi.h>
# include <linux/usb/atmel_usba_udc.h>
# include <asm/atmel-nti.h>
# include <asm/io.h>
# include <linux/usb/atmel_usba_udc.h>
# include <asm/io.h>
# include <asm/irq.h>
# include <mach/at32uc3a0xxx.h>
# include <mach/board.h>
# include <mach/beatrix.h>
# include <mach/portmux.h>
# include <mach/sram.h>
# include "clock.h"
# include "gpio.h"
# include "pm.h"

# define MEMRANGE(base, size) 
{ 
.start = base,
.end = base + size - 1,
.flags = IORESOURCE_MEM,
}

# define PBMEM(base) 
{ 
.start = base,
.end = base + 0x3ff,
.flags = IORESOURCE_MEM,
}

# define IRQ(num) 
{ 
.start = num,
.end = num,
.flags = IORESOURCE_IRQ,
}

# define NAMED_IRQ(num, _name) 
{ 
.start = num,
.end = num,
.name = _name,
.flags = IORESOURCE_IRQ,
}

# define select_peripheral(pin, periph, flags) 
at32_select_periph(GPIO_PIN##pin, GPIO##periph, flags)

# define DEV_CLK(_name, devname, bus, _index) 
static struct clk devname##_##_name = { 

.name = #_name,

.dev = &devname####_device.dev, 

.parent = &bus##_clk, 

.mode = bus##_clk_mode,

}

/* REVISIT these assume *every* device supports DMA, but several 
don’t ... tc, ssc, pio, rtc, watchdog, pwm, ps2, and more. */

# define DEFINE_DEV(_name, _id) 
static u64 _name##_id##_dma_mask = DMA_32BIT_MASK;
static struct platform_device _name##_id##_device = { 
.name = #_name,
.id = _id,
.dev = { 
.dma_mask = &_name##_id##_dma_mask,
.coherent_dma_mask = DMA_32BIT_MASK,
.resource = _name###_resource,
.num_resources = ARRAY_SIZE(_name###_resource),
}

# define DEFINE_DEV_DATA(_name, _id) 
static u64 _name##_id##_dma_mask = DMA_32BIT_MASK;
static struct platform_device _name##_id##_device = { 
.name = #_name,
.id = _id,
.dev = { 
.dma_mask = &_name##_id##_dma_mask,
.platform_data = &_name##_id##_data,
.coherent_dma_mask = DMA_32BIT_MASK,
.resource = _name###_resource,
.num_resources = ARRAY_SIZE(_name###_resource),
}

*/
+ .get_rate = bus##.clk_get_rate, \
+ .index = _index, \
+ }
}
+
+
+ static DEFINE_SPINLOCK(pm_lock);
+static struct clk osc0;
+static struct clk osc1;
+static unsigned long osc_get_rate(struct clk *clk)
+{ return at32_board_osc_rates[clk->index];
+}
+
+static unsigned long pll_get_rate(struct clk *clk, unsigned long control)
+{
+ unsigned long div, mul, rate;
+ div = PM_BFEXT(PLLDIV, control) + 1;
+ mul = PM_BFEXT(PLLMUL, control) + 1;
+ rate = clk->parent->get_rate(clk->parent);
+ rate = (rate + div / 2) / div;
+ rate *= mul;
+ return rate;
+}
+
+static long pll_set_rate(struct clk *clk, unsigned long rate, u32 * pll_ctrl)
+{
+ unsigned long mul;
+ unsigned long mul_best_fit = 0;
+ unsigned long div;
+ unsigned long div_min;
+ unsigned long div_max;
+ unsigned long div_best_fit = 0;
+ unsigned long base;
+ unsigned long pll_in;
+ unsigned long actual = 0;
+ unsigned long rate_error;
+ unsigned long rate_error_prev = ~0UL;
+ u32 ctrl;
+ /* Rate must be between 80 MHz and 200 Mhz. */
+ if (rate < 80000000UL || rate > 200000000UL)
+ return -EINVAL;
+ ctrl = PM_BF(PLLOPT, 4);
+ base = clk->parent->get_rate(clk->parent);
+ /* PLL input frequency must be between 6 MHz and 32 MHz. */
+ div_min = DIV_ROUND_UP(base, 32000000UL);
+ div_max = base / 6000000UL;
+ if (div_max < div_min)
+ return -EINVAL;
+ for (div = div_min; div <= div_max; div++) {
+ pll_in = (base + div / 2) / div;
+ mul = (rate + pll_in / 2) / pll_in;
+ if (mul == 0)
+ continue;
+ actual = pll_in * mul;
+ rate_error = abs(actual - rate);
+ if (rate_error < rate_error_prev) {
+ mul_best_fit = mul;
+ div_best_fit = div;
+ rate_error_prev = rate_error;
+ }
+ if (rate_error == 0)
+ break;
+ }
+ ctrl |= PM_BF(PLLMUL, mul_best_fit - 1);
+ ctrl |= PM_BF(PLLDIV, div_best_fit - 1);
+ ctrl |= PM_BF(PLLCOUNT, 16);
341 + if (clk->parent == &osc1)
342 +     ctrl |= PM_BIT(PLLOSC);
343 +     *pll_ctrl = ctrl;
344 +     return actual;
345 + }
346 +
347 + static unsigned long pll0_get_rate(struct clk *clk)
348 + {
349 +     u32 control;
350 +     control = pm_readl(PLL0);
351 +     return pll_get_rate(clk, control);
352 + }
353 +
354 + static void pll1_mode(struct clk *clk, int enabled)
355 + {
356 +     unsigned long timeout;
357 +     u32 status;
358 +     u32 ctrl;
359 +     ctrl = pm_readl(PLL1);
360 +     if (enabled) {
361 +         if (!PM_BFEXT(PLL_MUL, ctrl) && !PM_BFEXT(PLL_DIV, ctrl)) {
362 +             pr_debug(" clk %s: failed to enable, rate not set
", clk->name);
363 +             return;
364 +         }
365 +         ctrl |= PM_BIT(PLLEN);
366 +         pm_writel(PLL1, ctrl);
367 +     } else {
368 +         ctrl &= ~PM_BIT(PLLEN);
369 +         pm_writel(PLL1, ctrl);
370 +     }
371 +     /* Wait for PLL lock. */
372 +     for (timeout = 10000; timeout--; ) {
373 +         status = pm_readl(ISR);
374 +         if (status & PM_BIT(LOCK1))
375 +             break;
376 +         delay(10);
377 +     }
378 +     if (!(status & PM_BIT(LOCK1))){
379 +         printk(KERN_ERR " clk %s: timeout waiting for lock
", clk->name);
380 +         return;
381 +     }
382 +     return pll_get_rate(clk, control);
383 + }
384 +
385 + static unsigned long pll1_get_rate(struct clk *clk)
386 + {
387 +     u32 control;
388 +     control = pm_readl(PLL1);
389 +     return pll_get_rate(clk, control);
390 + }
391 +
392 + static long pll1_set_rate(struct clk *clk, unsigned long rate, int apply)
393 + {
394 +     u32 ctrl = 0;
395 +     unsigned long actual_rate;
396 +     actual_rate = pll_set_rate(clk, rate, &ctrl);
397 +     if (apply) {
398 +         if (actual_rate != rate)
399 +             return -EINVAL;
400 +         if (clk->users > 0)
401 +             return -EBUSY;
402 +         pr_debug(KERN_INFO "clk %s: new rate %lu (actual rate %lu)\n", 
403 +                     clk->name, rate, actual_rate);
404 +         pm_writel(PLL1, ctrl);
405 +     }
406 +     return actual_rate;
407 + }
408 +
409 + static int pll1_set_parent(struct clk *clk, struct clk *parent)
410 + {
411 +     if (parent == NULL)
412 +         return -EINVAL;
413 +     if (parent->users > 0)
414 +         return -EBUSY;
415 +     pr_debug(KERN_INFO "clk %s: new parent \n", clk->name);
416 +     pm_writel(PLL1, parent->data);
417 +     return 0;
418 + }
419 +
420 + return 0;
+ u32 ctrl;
+ if (clk->users > 0)
+   return -EBUSY;
+ ctrl = pm_readl(PLL1);
+ WARN_ON(ctrl & PM_BIT(PLLEN));
+ if (parent == & osc0)
+   ctrl &= ~PM_BIT(PLLOSC);
+ else if (parent == & osc1)
+   ctrl |= PM_BIT(PLLOSC);
+ else
+   return -EINVAL;
+ pm_writel(PLL1, ctrl);
+ clk->parent = parent;
+ return 0);
+ * The AT32UC3A0512 has six primary clock sources: One 32kHz oscillator,
+ * one, external slow-clock, two crystal oscillators and two PLLs.
+ */
+
+ static struct clk osc32k = {
+   .name = "osc32k",
+   .get_rate = osc_get_rate,
+   .users = 1,
+   .index = 0,
+);
+ static struct clk osc0 = {
+   .name = "osc0",
+   .get_rate = osc_get_rate,
+   .users = 1,
+   .index = 1,
+};
+ static struct clk osc1 = {
+   .name = "osc1",
+   .get_rate = osc_get_rate,
+   .index = 2,
+};
+ static struct clk pll0 = {
+   .name = "pll0",
+   .parent = &osc0,
+};
+ static struct clk pll1 = {
+   .name = "pll1",
+   .mode = pll1_mode,
+   .get_rate = pll1_get_rate,
+   .set_rate = pll1_set_rate,
+   .set_parent = pll1_set_parent,
+   .parent = &osc0,
+};
+ static struct clk *main_clock;
+ */
+
+ /* Synchronous clocks are generated from the main clock. The clocks
+ * must satisfy the constraint
+ * fCPU >= fHSB >= fPB
+ * i.e. each clock must not be faster than its parent.
+ */
+ static unsigned long bus_clk_get_rate(struct clk *clk, unsigned int shift)
+ {
+   return main_clock->get_rate(main_clock) >> shift;
+ }
+ static void cpu_clk_mode(struct clk *clk, int enabled)
+ {   
+   unsigned long flags;
+   u32 mask;
+   spin_lock_irqsave(&pm_lock, flags);
+   mask = pm_readl(CPU_MASK);
+   if (enabled)
+     mask |= 1 << clk->index;
+   else
+     return -EINVAL;
+   pm_writel(PLL1, ctrl);
+   clk->parent = parent;
+   return 0;
mask &= ~(1 << clk->index);

pm_writel(CPU_MASK, mask);

spin_unlock_irqrestore(&pm_lock, flags);

}

+ static unsigned long cpu_clk_get_rate(struct clk *clk)
+ {
+   unsigned long cksel, shift = 0;
+   cksel = pm_readl(CKSEL);
+   if (cksel & PM_BIT(CPUDIV))
+     shift = PM_BFEXT(CPUSEL, cksel) + 1;
+   return bus_clk_get_rate(clk, shift);
+ }

+ static long cpu_clk_set_rate(struct clk *clk, unsigned long rate, int apply)
+ {
+   u32 control;
+   unsigned long parent_rate, child_div, actual_rate, div;
+   parent_rate = clk->parent->get_rate(clk->parent);
+   control = pm_readl(CKSEL);
+   if (control & PM_BIT(HSBDIV))
+     child_div = 1 << (PM_BFEXT(HSBSEL, control) + 1);
+   else
+     child_div = 1;
+   if (rate > 3 * (parent_rate / 4) || child_div == 1) {
+     actual_rate = parent_rate;
+     control &= ~PM_BIT(CPUDIV);
+   } else {
+     unsigned int cpusel;
+     div = (parent_rate + rate / 2) / rate;
+     if (div > child_div)
+       div = child_div;
+     cpusel = (div > 1) ? (fls(div) - 2) : 0;
+     control |= PM_BIT(CPUDIV) | PM_BFINS(CPUSEL, cpusel, control);
+     actual_rate = parent_rate / (1 << (cpusel + 1));
+   }
+   pr_debug(" clk %s: new rate %lu (actual rate %lu)\n",
+             clk->name, rate, actual_rate);
+   if (apply)
+     pm_writel(CKSEL, control);
+   return actual_rate;
+ }

+ static void hsb_clk_mode(struct clk *clk, int enabled)
+ {
+   unsigned long flags;
+   u32 mask;
+   spin_lock_irqsave(&pm_lock, flags);
+   mask = pm_readl(HSB_MASK);
+   if (enabled)
+     mask |= 1 << clk->index;
+   else
+     mask &= ~(1 << clk->index);
+   pm_writel(HSB_MASK, mask);
+   spin_unlock_irqrestore(&pm_lock, flags);
+ }

+ static unsigned long hsb_clk_get_rate(struct clk *clk)
+ {
+   unsigned long cksel, shift = 0;
+   cksel = pm_readl(CKSEL);
+   if (cksel & PM_BIT(HSBDIV))
+     shift = PM_BFEXT(HSBSEL, cksel) + 1;
+   return bus_clk_get_rate(clk, shift);
+ }

+ static void pba_clk_mode(struct clk *clk, int enabled)
+ {
+   unsigned long flags;
+   u32 mask;
+   spin_lock_irqsave(&pm_lock, flags);
+   mask = pm_readl(PBA_MASK);
+   if (enabled)
+     mask |= 1 << clk->index;
+   else
+     mask &= ~(1 << clk->index);
+   pm_writel(PBA_MASK, mask);
+   spin_unlock_irqrestore(&pm_lock, flags);
+ }

D.28. UC3A0XXX SUPPORT 179
+ if (enabled)
+   mask |= 1 << clk->index;
+ else
+   mask &= ~(1 << clk->index);
+ pm_writel(PBA_MASK, mask);
+ spin_unlock_irqrestore(&pm_lock, flags);
+
+ *static unsigned long pba_clk_get_rate(struct clk *clk)
+{
+   unsigned long cksel, shift = 0;
+   cksel = pm_readl(CKSEL);
+   if (cksel & PM_BIT(PBADIV))
+     shift = PM_BFEXT(PBASEL, cksel) + 1;
+   return bus_clk_get_rate(clk, shift);
+}
+
+ *static void pbb_clk_mode(struct clk *clk, int enabled)
+{
+   unsigned long flags;
+   u32 mask;
+   spin_lock_irqsave(&pm_lock, flags);
+   mask = pm_readl(PBB_MASK);
+   if (enabled)
+     mask |= 1 << clk->index;
+   else
+     mask &= ~(1 << clk->index);
+   pm_writel(PBB_MASK, mask);
+   spin_unlock_irqrestore(&pm_lock, flags);
+
+ *static unsigned long pbb_clk_get_rate(struct clk *clk)
+{
+   unsigned long cksel, shift = 0;
+   cksel = pm_readl(CKSEL);
+   if (cksel & PM_BIT(PBBDIV))
+     shift = PM_BFEXT(PBBSEL, cksel) + 1;
+   return bus_clk_get_rate(clk, shift);
+}
+
+ *static struct clk cpu_clk = {
+   .name = "cpu",
+   .get_rate = cpu_clk_get_rate,
+   .set_rate = cpu_clk_set_rate,
+   .users = 1,
+};
+
+ *static struct clk hsb_clk = {
+   .name = "hsb",
+   .parent = &cpu_clk,
+   .get_rate = hsb_clk_get_rate,
+};
+
+ *static struct clk pba_clk = {
+   .name = "pba",
+   .parent = &hsb_clk,
+   .mode = hsb_clk_mode,
+   .get_rate = pba_clk_get_rate,
+   .users = 1,
+   .index = 1,
+};
+
+ *static struct clk pbb_clk = {
+   .name = "pbb",
+   .parent = &hsb_clk,
+   .mode = hsb_clk_mode,
+   .get_rate = pbb_clk_get_rate,
+   .users = 1,
+   .index = 2,
+};
+
+ /* --------------------------------------------------------------------
+ * Generic Clock operations
+ * -------------------------------------------------------------------- */
+ 
+ static void genclk_mode(struct clk *clk, int enabled)
+{
+   u32 control;
+   control = pm_readl(GCCTRL(clk->index));
+   if (enabled)
+     control |= PM_BIT(CEN);
D.28. UC3A0XXX SUPPORT

677 +    else
678 +        control &= ~PM_BIT(CEN);
679 +        pm_writel(GCCTRL(clk->index), control);
680 +    }
681 +
682 +static unsigned long genclk_get_rate(struct clk *clk)
683 +{
684 +    u32 control;
685 +    unsigned long div = 1;
686 +    control = pm_readl(GCCTRL(clk->index));
687 +    if (control & PM_BIT(DIVEN))
688 +        div = 2 * (PM_BFEXT(DIV, control) + 1);
689 +    pm_writel(GCCTRL(clk->index), control);
690 +    return clk->parent->get_rate(clk->parent) / div;
691 +}
692 +
693 +static long genclk_set_rate(struct clk *clk, unsigned long rate, int apply)
694 +{
695 +    u32 control;
696 +    unsigned long parent_rate, actual_rate, div;
697 +    parent_rate = clk->parent->get_rate(clk->parent);
698 +    control = pm_readl(GCCTRL(clk->index));
699 +    if (rate > 3 * parent_rate / 4) {
700 +        actual_rate = parent_rate;
701 +        control &= ~PM_BIT(DIVEN);
702 +    } else {
703 +        div = (parent_rate + rate) / (2 * rate) - 1;
704 +        control = PM_BFINS(DIV, div, control) | PM_BIT(DIVEN);
705 +        actual_rate = parent_rate / (2 * (div + 1));
706 +    }
707 +    dev_dbg(clk->dev, " clk %s: new rate %lu (actual rate %lu)\n", 
708 +            clk->name, rate, actual_rate);
709 +    if (apply)
710 +        pm_writel(GCCTRL(clk->index), control);
711 +    return actual_rate;
712 +}
713 +
714 +int genclk_set_parent(struct clk *clk, struct clk *parent)
715 +{
716 +    u32 control;
717 +    dev_dbg(clk->dev, " clk %s: new parent %s (was %s)\n", 
718 +            clk->name, parent->name, clk->parent->name);
719 +    control = pm_readl(GCCTRL(clk->index));
720 +    if (parent == &osc1 || parent == &pll1)
721 +        control |= PM_BIT(PLLSEL);
722 +    else if (parent == &osc0 || parent == &pll0)
723 +        control &= ~PM_BIT(PLLSEL);
724 +    pm_writel(GCCTRL(clk->index), control);
725 +    clk->parent = parent;
726 +    return 0;
727 +}
728 +
729 +static void __init genclk_init_parent(struct clk *clk)
730 +{
731 +    u32 control;
732 +    struct clk *parent;
733 +    BUG_ON(clk->index > 7);
734 +    control = pm_readl(GCCTRL(clk->index));
735 +    if (control & PM_BIT(OSCSEL))
736 +        parent = (control & PM_BIT.PLLSEL) ? &pll1 : &osc1;
737 +    else
738 +        parent = (control & PM_BIT.PLLSEL) ? &pll0 : &osc0;
739 +    clk->parent = parent;
static struct resource at32_pm0_resource[] = {
    MEMRANGE(0xffff0c00, 0x100),
    IRQ(1),
};

static struct resource at32uc3a0xxx_rtc0_resource[] = {
    MEMRANGE(0xffff0d00, 0x24),
    IRQ(1),
};

static struct resource at32_wdt0_resource[] = {
    MEMRANGE(0xffff0d30, 0x8),
};

static struct resource at32_eic0_resource[] = {
    MEMRANGE(0xffff0d80, 0x3c),
    IRQ(1),
};

DEFINE_DEV(at32_pm, 0);
DEFINE_DEV(at32uc3a0xxx_rtc, 0);
DEFINE_DEV(at32_wdt, 0);
DEFINE_DEV(at32_eic, 0);

/* Peripheral clock for PM, RTC and EIC. PM will ensure that this
   is always running.
*/
static struct clk at32_pm_pclk = {
    .name = "pclk",
    .dev = &at32_pm0_device.dev,
    .parent = &pba_clk,
    .get_rate = pba_clk_get_rate,
    .users = 1,
    .index = 3,
};

static struct resource intc0_resource[] = {
    PBMEM(0xffff0800),
};

struct platform_device at32_intc0_device = {
    .name = "intc",
    .id = 0,
    .resource = intc0_resource,
    .num_resources = ARRAY_SIZE(intc0_resource),
};

DEV_CLK(pclk, at32_intc0, pba, 0);

static struct clk ebi_clk = {
    .name = "ebi",
    .parent = &hsb_clk,
    .mode = hsb_clk_mode,
    .get_rate = hsb_clk_get_rate,
    .users = 6,
};

static struct clk sdranc_clk = {
    .name = "sdranc_clk",
    .parent = &pbb_clk,
    .mode = pbb_clk_mode,
    .get_rate = pbb_clk_get_rate,
    .users = 1,
    .index = 5,
};

static struct resource smc0_resource[] = {
    PBMEM(0xffffe1c00),
};

DEFINE_DEV(smc, 0);

static struct clk smc0_pclk = {
    .name = "pclk",
    .dev = &smc0_device.dev,
    .parent = &pbb_clk,
    .mode = pbb_clk_mode,
    .get_rate = pbb_clk_get_rate,
static struct clk smc0_mck = {
  .name = "mck",
  .dev = &smc0_device.dev,
  .parent = &hsb_clk,
  .mode = hsb_clk_mode,
  .get_rate = hsb_clk_get_rate,
  .users = 1,
  .index = 6,
};

static struct platform_device pdca_device = {
  .name = "pdca",
  .id = 0,
};

DEV_CLK(pclk, pdca, pba, 2);

/* -------------------------------------------------------------------- */
/* HMATRIX */
/* -------------------------------------------------------------------- */

struct clk at32_hmatrix_clk = {
  .name = "hmatrix_clk",
  .parent = &pbb_clk,
  .mode = pbb_clk_mode,
  .get_rate = pbb_clk_get_rate,
  .index = 0,
  .users = 1,
};

static inline void set_ebi_sfr_bits(u32 mask)
{
  hmatrix_sfr_set_bits(HMATRIX_SLAVE_EBI, mask);
}

/* -------------------------------------------------------------------- */
/* Timer/Counter (TC) */
/* -------------------------------------------------------------------- */

static struct resource at32_tc0_resource[] = {
  PBMEM(0xffff3800),
  IRQ(14),
};

static struct platform_device at32_tc0_device = {
  .name = "atmel_tc",
  .id = 0,
  .resource = at32_tc0_resource,
  .num_resources = ARRAY_SIZE(at32_tc0_resource),
};

DEV_CLK(tc0_clk, at32_tc0, pbb, 3);

/* -------------------------------------------------------------------- */
/* On-Chip Debug */
/* -------------------------------------------------------------------- */

static struct resource at32_ocd0_resource[] = {
  IRQ(14),
};

static struct platform_device at32_ocd0_device = {
  .name = "atmel_ocd",
  .id = 0,
  .resource = at32_ocd0_resource,
  .num_resources = ARRAY_SIZE(at32_ocd0_resource),
};

struct clk at32_ocd0_clk = {
  .name = "at32_ocd0_clk",
  .parent = &cpu_clk,
  .mode = cpu_clk_node,
  .get_rate = cpu_clk_get_rate,
  .index = 1,
  .users = 1,
};

/* -------------------------------------------------------------------- */
/* GPIO */
/* -------------------------------------------------------------------- */
static struct resource gpio0_resource[] = {
    MEMRANGE(0xffff1000, 0x100),
    IRQ(2),
};

+static struct resource gpio1_resource[] = {
    MEMRANGE(0xffff1100, 0x100),
    IRQ(2),
};

+static struct resource gpio2_resource[] = {
    MEMRANGE(0xffff1200, 0x100),
    IRQ(2),
};

+static struct resource gpio3_resource[] = {
    MEMRANGE(0xffff1300, 0x100),
    IRQ(2),
};

+void __init at32_add_system_devices(void)
{
    platform_device_register(&at32_pm0_device);
    platform_device_register(&at32_intc0_device);
    platform_device_register(&at32c3e0xx_rtc0_device);
    platform_device_register(&at32_int0_device);
    platform_device_register(&at32_eic0_device);
    platform_device_register(&at32_ocd0_device);
    platform_device_register(&at32_tc0_device);
    platform_device_register(&gpio0_device);
    platform_device_register(&gpio1_device);
    platform_device_register(&gpio2_device);
    platform_device_register(&gpio3_device);
}

/* USART */

static struct atmel_uart_data atmel_usart0_data = {
    .use_dma_tx = 1,
    .use_dma_rx = 1,
};
static struct resource atmel_usart0_resource[] = {
    PBMEM(0xffff1400),
    IRQ(5),
};

+DEFINE_DEV_DATA(atmel_usart, 0);
+DEV_CLR(usart, atmel_usart0, pba, 8);

+static struct atmel_uart_data atmel_usart1_data = {
    .use_dma_tx = 1,
    .use_dma_rx = 1,
};
+static struct resource atmel_usart1_resource[] = {
    PBMEM(0xffff1800),
    IRQ(6),
};

+DEFINE_DEV_DATA(atmel_usart, 1);
+DEV_CLR(usart, atmel_usart1, pba, 9);

+static struct atmel_uart_data atmel_usart2_data = {
    .use_dma_tx = 1,
    .use_dma_rx = 1,
};
+static struct resource atmel_usart2_resource[] = {
    PBMEM(0xffff1c00),
    IRQ(7),
};

+DEFINE_DEV_DATA(atmel_usart, 2);
+DEV_CLR(usart, atmel_usart2, pba, 10);
static struct atmel_uart_data atmel_usart3_data = {
    .use_dma_tx = 1,
    .use_dma_rx = 1,
};

static struct resource atmel_usart3_resource[] = {
    PBMEM(0xffff2000),
    IRQ(8),
};

+DEFINE_DEV_DATA(atmel_usart, 3);
+DEV_CLR(usrat, atmel_usart3, pb, 11);

+static inline void configure_usart0_pins(void)
+{
    + select_peripheral(PA(0), PERIPH_A, 0); /* RXD */
    + select_peripheral(PA(1), PERIPH_A, 0); /* TXD */
}

+static inline void configure_usart1_pins(void)
+{
    + select_peripheral(PA(5), PERIPH_A, 0); /* RXD */
    + select_peripheral(PA(6), PERIPH_A, 0); /* TXD */
}

+static inline void configure_usart2_pins(void)
+{
    + select_peripheral(PB(29), PERIPH_A, 0); /* RXD */
    + select_peripheral(PB(30), PERIPH_A, 0); /* TXD */
}

+static inline void configure_usart3_pins(void)
+{
    + select_peripheral(PB(10), PERIPH_B, 0); /* RXD */
    + select_peripheral(PB(11), PERIPH_B, 0); /* TXD */
}

+static struct platform_device **_initdata at32_usarts[4];

+void __init at32_map_usart(unsigned int hw_id, unsigned int line)
{ +
    + struct platform_device *pdev;
    + struct atmel_uart_data *data;
    + switch (hw_id) {
        +
        + case 0:
            + pdev = &atmel_usart0_device;
            + configure_usart0_pins();
            + break;
        + case 1:
            + pdev = &atmel_usart1_device;
            + configure_usart1_pins();
            + break;
        + case 2:
            + pdev = &atmel_usart2_device;
            + configure_usart2_pins();
            + break;
        + case 3:
            + pdev = &atmel_usart3_device;
            + configure_usart3_pins();
            + break;
        + default:
            + return;
        + }
    +
    + struct platform_device *__init at32_add_device_usart(unsigned int id)
    + {
        + platform_device_register(at32_usarts[id]);
        + return at32_usarts[id];
    + }

+static struct platform_device *atmel_default_console_device;

+void __init at32_setup_serial_console(unsigned int usart_id)
+{
    + atmel_default_console_device = at32_usarts[usart_id];
+}
```c
+/* --------------------------------------------------------------------
+ * Ethernet
+ * -------------------------------------------------------------------- */
+static struct eth_platform_data macb0_data;
+static struct resource macb0_resource[] = {
+    PMEM(0xffff1800),
+    IRQ(16),
+};
+#DEFINE_DEV_DATA(macb, 0);
+#DEV_CLK(hclk, macb0, hsb, 4);
+#DEV_CLK(pclk, macb0, pbb, 3);
+
+static struct platform_device *__init
+at32_add_device_eth(unsigned int id, struct eth_platform_data *data)
+{
+    struct platform_device *pdev = &macb0_device;
+    switch (id) {
+        case 0:
+            pdev = &macb0_device;
+            break;
+        default:
+            return NULL;
+    }
+    memcpy(pdev->dev.platform_data, data, sizeof(struct eth_platform_data));
+    platform_device_register(pdev);
+    return pdev;
+}
+
+/* --------------------------------------------------------------------
+ * SPI
+ * -------------------------------------------------------------------- */
+static struct resource atmel_spi0_resource[] = {
+    PMEM(0xffff2400),
+    IRQ(9),
+};
+#DEFINE_DEV(atmel_spi, 0);
+#DEV_CLK(spi_clk, atmel_spi0, pba, 5);
+static struct resource atmel_spi1_resource[] = {
+    PMEM(0xffff2800),
+    IRQ(10),
+};
+#DEFINE_DEV(atmel_spi, 1);
+#DEV_CLK(spi_clk, atmel_spi1, pba, 6);
+
+static void __init
+at32_spi_setup_slaves(unsigned int bus_num, struct spi_board_info *b,
+    unsigned int n, const u8 *pins)
+{
+    unsigned int pin, mode;
+    for (; n--; b++) {
+        b->bus_num = bus_num;
+        if (b->chip_select >= 4) continue;
+        select_peripheral(PC(0), PERIPH_A, 0); /* COL */
+        select_peripheral(PC(1), PERIPH_A, 0); /* CRS */
+        select_peripheral(PC(2), PERIPH_A, 0); /* TIER */
+        select_peripheral(PC(3), PERIPH_A, 0); /* TXE */
+        select_peripheral(PC(4), PERIPH_A, 0); /* TXD2 */
+        select_peripheral(PC(5), PERIPH_A, 0); /* TXD3 */
+        select_peripheral(PC(6), PERIPH_A, 0); /* RXD2 */
+        select_peripheral(PC(7), PERIPH_A, 0); /* RXD3 */
+        select_peripheral(PC(8), PERIPH_A, 0); /* RXC */
+        select_peripheral(PC(9), PERIPH_A, 0); /* RXD */
+        select_peripheral(PC(10), PERIPH_A, 0); /* RXD */
+        select_peripheral(PC(11), PERIPH_A, 0); /* RXD */
+        select_peripheral(PC(12), PERIPH_A, 0); /* RXD */
+        select_peripheral(PC(13), PERIPH_A, 0); /* RXD */
+        select_peripheral(PC(14), PERIPH_A, 0); /* RXD */
+        select_peripheral(PC(15), PERIPH_A, 0); /* RXD */
+        select_peripheral(PC(16), PERIPH_A, 0); /* RXD */
+        select_peripheral(PC(17), PERIPH_A, 0); /* RXD */
+        select_peripheral(PC(18), PERIPH_A, 0); /* SPD */
+        break;
+    }
+    for (pin = 0; pin < 16; pin++)
+        if (pin < 8) mode = PC(1);
+        else mode = PC(0);
+        at32_select_periph((pin) % 8, mode, 0);
+    return NULL;
+}
+}
+*/
```
+ pin = (unsigned)b->controller_data;
+ if (!pin) {
+     pin = pins[b->chip_select];
+ }
+ b->controller_data = (void*)pin;
+}
+ mode = AT32_GPIOF_OUTPUT;
+ if (!(b->mode & SPI_CS_HIGH))
+     mode |= AT32_GPIOF_HIGH;
+ at32_select_gpio(pin, mode);
+
+ return NULL;
}
+ ARRAY_SIZE(atmel_twi0_resource))
+ goto err_add_resources;
+ select_peripheral((PA(6), PERIPH_A, 0); /* SDA */
+ select_peripheral((PA(7), PERIPH_A, 0); /* SDL */
+ atmel_twi0_pclk.dev = &pdev->dev;
+ if (b)
+ i2c_register_board_info(id, b, n);
+ platform_device_add(pdev);
+ return pdev;
+ *err_add_resources:
+ platform_device_put(pdev);
+ return NULL;
+
+ struct platform_device *__init at32_add_device_pwm(u32 mask)
+
+ if (! mask)
+ return NULL;
+ pdev = platform_device_alloc("atmel_pwm", 0);
+ if (!pdev)
+ return NULL;
+ if (platform_device_add_resources(pdev, atmel_pwm0_resource,
+ ARRAY_SIZE(atmel_pwm0_resource))
+ goto out_free(pdev);
+ if (platform_device_add_data(pdev, &mask, sizeof(mask))
+ goto out_free(pdev);
+ if (mask & (1 << 0))
+ select_peripheral((PA(28), PERIPH_A, 0);
+ if (mask & (1 << 1))
+ select_peripheral((PA(29), PERIPH_A, 0);
+ if (mask & (1 << 2))
+ select_peripheral((PA(21), PERIPH_B, 0);
+ if (mask & (1 << 3))
+ select_peripheral((PA(22), PERIPH_B, 0);
+ atmel_pwm0_nck.dev = &pdev->dev;
+ platform_device_add(pdev);
+ return pdev;
+ out_free(pdev):
+ platform_device_put(pdev);
+ return NULL;
+} /* -- ------------------------------------------------------------- */
+
+ static struct resource atmel_pwm0_resource[] __initdata = {
+ PBMEM(0xffff3000),
+ IRQ(12),
+};
+
+ DEFINE_DEV(ssc, 0);
+ DEV_CLK(pclk, ssc0, pba, 13);
D.28. UC3A0XXX SUPPORT

1349 + at32_add_device_ssc(unsigned int id, unsigned int flags)
1350 + { 
1351 + struct platform_device *pdev;
1352 + 
1353 + switch (id) {
1354 + case 0:
1355 + pdev = &ssc0_device;
1356 + if (flags & ATMEL_SSC_RF)
1357 + select_peripheral(PA(21), PERIPH_A, 0); /* RF */
1358 + if (flags & ATMEL_SSC_RK)
1359 + select_peripheral(PA(22), PERIPH_A, 0); /* RK */
1360 + if (flags & ATMEL_SSC_TK)
1361 + select_peripheral(PA(23), PERIPH_A, 0); /* TK */
1362 + if (flags & ATMEL_SSC_TF)
1363 + select_peripheral(PA(24), PERIPH_A, 0); /* TF */
1364 + if (flags & ATMEL_SSC_TD)
1365 + select_peripheral(PA(25), PERIPH_A, 0); /* TD */
1366 + if (flags & ATMEL_SSC_RD)
1367 + select_peripheral(PA(26), PERIPH_A, 0); /* RD */
1368 + break;
1369 + default:
1370 + return NULL;
1371 + }
1372 + 
1373 + platform_device_register(pdev);
1374 + return pdev;
1375 + }
1376 +
1377 + /*******************************************************************/
1378 + */
1379 + * USB Device Controller
1380 + *----------------------------------------------------------------missive */
1381 + static struct resource usba0_resource[] __initdata = {
1382 + {
1383 + .start = 0xe0000000,
1384 + .end = 0xefffffff,
1385 + },
1386 + {
1387 + .start = 0xfffe0000,
1388 + .end = 0xfffe0fff,
1389 + },
1390 + IRQ(17),
1391 +
1392 + static struct clk usba0_pclk = {
1393 + .name = "pclk",
1394 + .parent = &pbb_clk,
1395 + .mode = pbb_clk_mode,
1396 + .get_rate = pbb_clk_get_rate,
1397 + .index = 2,
1398 +
1399 + static struct clk usba0_hclk = {
1400 + .name = "hclk",
1401 + .parent = &hsb_clk,
1402 + .mode = hsb_clk_mode,
1403 + .get_rate = hsb_clk_get_rate,
1404 + .index = 3,
1405 +
1406 + #define EP(nam, idx, maxpkt, maxbk, dma, isoc)
1407 + [idx] = {
1408 + .name = nam,
1409 + .index = idx,
1410 + .fifo_size = maxpkt,
1411 + .nr_banks = maxbk,
1412 + .can_dma = dma,
1413 + .can_isoc = isoc,
1414 +
1415 + }
1416 +
1417 + static struct usba_ep_data at32_usba_ep[]{__initdata = {
1418 + EP("ep0", 0, 64, 1, 0, 0),
1419 + EP("ep1", 1, 512, 2, 1, 1),
1420 + EP("ep2", 2, 512, 2, 1, 1),
1421 + EP("ep3-int", 3, 64, 3, 1, 0),
1422 + EP("ep4-int", 4, 64, 3, 1, 0),
1423 + EP("ep5", 5, 1024, 3, 1, 1),
1424 + EP("ep6", 6, 1024, 3, 1, 1),
1425 +
1426 +
1427 + #undef EP
1428 +
1429 + struct platform_device __init
1430 + at32_add_device_usba(unsigned int id, struct usba_platform_data *data)
1431 + { 
1432 + /*
pdata doesn’t have room for any endpoints, so we need to
append room for the ones we need right after it.

struct {
    struct usb_platform_data pdata;
    struct usb_ep_data ep[7];
} usb_data;

struct platform_device *pdev;

if (id != 0)
    return NULL;

pdev = platform_device_alloc("atmel_usba_udc", 0);
if (!pdev)
    return NULL;

if (platform_device_add_resources(pdev, usba0_resource,
    ARRAY_SIZE(usba0_resource)))
    goto out_free(pdev);

if (data)
    usba_data.pdata.vbus_pin = data->vbus_pin;
else
    usba_data.pdata.vbus_pin = -EINVAL;

data = &usba_data.pdata;
data->num_ep = ARRAY_SIZE(at32_usba_ep);
memcpy(data->ep, at32_usba_ep, sizeof(at32_usba_ep));

if (platform_device_add_data(pdev, data, sizeof(usba_data)))
    goto out_free(pdev);

if (data->vbus_pin >= 0)
    at32_select_gpio(data->vbus_pin, 0);

usba0_pclk.dev = &pdev->dev;
usba0_bclk.dev = &pdev->dev;

platform_device_add(pdev);
return pdev;

out_free(pdev):
platform_device_put(pdev);
return NULL;
}

/* --------------------------------------------------------------------
 * GCLK
 * -------------------------------------------------------------------- */

static struct clk gclk0 = {
    .name = "gclk0",
    .mode = genclk_mode,
    .get_rate = genclk_get_rate,
    .set_rate = genclk_set_rate,
    .set_parent = genclk_set_parent,
    .index = 0,
};

static struct clk at32_clock_list[] = {
    &osc32k,
    &osc0,
    &osc1,
    &p110,
    &p311,
    &cpu_clk,
    &hobclk,
    &pba_clk,
    &ppb_clk,
    &at32_pm_pclk,
    &at32_int0_pclk,
    &at32_hmatrix_clk,
    &abi_clk,
    &sdranc_clk,
    &smc0_pclk,
    &smc0_mck,
    &pdca_pclx,
    &at32_ocd0_clk,
    &gpio0_mck,
    &gpio1_mck,
    &gpio2_mck,
    &gpio3_mck,
    &at32_tc0_t0_clk,
unsigned int at32_nr_clocks = ARRAY_SIZE(at32_clock_list);

#define __init setup_platform(void)  
{
    u32 cpu_mask = 0, hsb_mask = 0, pba_mask = 0, pbb_mask = 0;
    int i;
    
    if (pm_readl(MCCTRL) & PM_BIT(PLLSEL)) {
        main_clock = &pll0;
        cpu_clk.parent = &pll0;
    } else {
        main_clock = &osc0;
        cpu_clk.parent = &osc0;
    }  

    if (pm_readl(PLL0) & PM_BIT(PLLOSC))  
        pll0.parent = &osc1;
    if (pm_readl(PLL1) & PM_BIT(PLLOSC))  
        pll1.parent = &osc1;
    
    genclk_init_parent(&gclk0);
    
    /* Initialize the port muxes */
    at32_init_gpio(&gpio0_device);
    at32_init_gpio(&gpio1_device);
    at32_init_gpio(&gpio2_device);
}

/* All UC3A chips currently have at least 32 KiB of internal SRAM. */
if (gen_pool_add(pool, 0x00000000, 32*1024, -1))
    goto err_pool_add;

sram_pool = pool;
return 0;

err_pool_add:
    gen_pool_destroy(pool);

...
static int clk_show(struct seq_file *, void *unused)
    seq_printf(s, "n");
    /* show clock tree as derived from the three oscillators
     * we "know" are at the head of the list
     */
    r.s = s;
    r.next = 0;
    /* protected from changes on the list while dumping */
    spin_lock(&clk_list_lock);
    /* show clock tree as derived from the three oscillators */
    clk = clk_get(NULL, "osc32k");
    dump_clock(clk, &r);
    clk_put(clk);
    clk = clk_get(NULL, "osc0");
    dump_clock(clk, &r);
    clk_put(clk);
    clk = clk_get(NULL, "osc1");
    dump_clock(clk, &r);
    clk_put(clk);
    /* protected from changes on the list while dumping */
    spin_unlock(&clk_list_lock);
    dump_clock(at32_clock_list[0], &r);
    dump_clock(at32_clock_list[1], &r);
    dump_clock(at32_clock_list[2], &r);

    return 0;
}
```c
- cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
- if (freqs.old < freqs.new)
  - boot_cpu_data.loops_per_jiffy = cpufreq_scale(
    - loops_per_jiffy_ref, ref_freq, freqs.new);
- clk_set_rate(cpukl, freq);
- if (freqs.new < freqs.old)
  - boot_cpu_data.loops_per_jiffy = cpufreq_scale(
    - loops_per_jiffy_ref, ref_freq, freqs.new);
  - cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);

pr_dbg("cpufreq: set frequency %lu Hz
", freq);
@@ -101,6 +87,7 @@ static int __init at32_cpufreq_driver_init (struct cpufreq_policy *policy)
    policy->cur = at32_get_speed(0);
    policy->min = policy->cpuinfo.min_freq;
    policy->max = policy->cpuinfo.max_freq;
+    policy->governor = CPUFREQ_DEFAULT_GOVERNOR;

print("cpufreq: AT32AP CPU frequency driver
");
```

```c
new file mode 100644
index 0000000 ..0 d3d4e6
--- /dev/null
+++ b/arch/avr32/mach-at32uc3a/gpio.c
@@ -0,0 +1,453 @@
+/*
+ * Atmel GPIO Port Multiplexer support
+ *
+ * Copyright (C) 2004 -2006 Atmel Corporation
+ */
+ #include <linux/clk.h>
+ #include <linux/irq.h>
+ #include <linux/platform_device.h>
```
+ include <linux/irq.h>
+ include <asm/gpio.h>
+ include <asm/io.h>
+ include <mach/portmux.h>
+ include "gpio.h"

#define MAX_NR_GPIO_DEVICES 5

struct gpio_device {
    struct gpio_chip chip;
    void __iomem *regs;
    const struct platform_device *pdev;
    struct clk *clk;
    u32 pinmux_mask;
    char name[8];
};

static struct gpio_device gpio_dev[MAX_NR_GPIO_DEVICES];

static struct gpio_device *gpio_pin_to_dev(unsigned int gpio_pin)
{
    struct gpio_device *gpio;
    unsigned int index;

    index = gpio_pin >> 5;
    if (index >= MAX_NR_GPIO_DEVICES)
        return NULL;
    gpio = &gpio_dev[index];
    if (!gpio->regs)
        return NULL;

    return gpio;
}

/* Pin multiplexing API */

void __init at32_select_periph(unsigned int pin, unsigned int periph,
    unsigned long flags)
{
    struct gpio_device *gpio;
    unsigned int pin_index = pin & 0xf;
    u32 mask = 1 << pin_index;

    gpio = gpio_pin_to_dev(pin);
    if (unlikely(!gpio)) {
        printk("gpio: invalid pin %u\n", pin);
        goto fail;
    }

    gpio_chip_is_requested(&gpio->chip, pin_index));
    printk("%s: pin %u is busy\n", gpio->name, pin_index);
    goto fail;

    if (!(flags & AT32_GPIOF_PULLUP))
        gpio_write(gpio, PUERC, mask);

    switch (periph) {
    case 0:
        gpio_write(gpio, PMR0C, mask);
        gpio_write(gpio, PMR1C, mask);
        break;
    case 1:
        gpio_write(gpio, PMR0S, mask);
        gpio_write(gpio, PMR1S, mask);
        break;
    case 2:
        gpio_write(gpio, PMR0C, mask);
        gpio_write(gpio, PMR1S, mask);
        break;
    case 3:
        gpio_write(gpio, PMR0S, mask);
        gpio_write(gpio, PMR1S, mask);
        break;
    default:
        printk("%s: invalid peripheral %u\n", gpio->name, periph);
        goto fail;
    }

    gpio_write(gpio, GPERC, mask);
    if (!(flags & AT32_GPIOF_PULLUP))
        gpio_write(gpio, PUERC, mask);
void __init at32_select_gpio(unsigned int pin, unsigned long flags)
{
    struct gpio_device *gpio;
    unsigned int pin_index = pin & 0x1f;
    u32 mask = 1 << pin_index;
    gpio = gpio_pin_to_dev(pin);
    if (unlikely(! gpio)) {
        printk(\"gpio: invalid pin %u\n\", pin);
        goto fail;
    }
    if (unlikely(test_and_set_bit(pin_index, &gpio->pinmux_mask))) {
        printk(\"%s: pin %u is busy\n\", gpio->name, pin_index);
        goto fail;
    }
    if (flags & AT32_GPIOF_OUTPUT) {
        if (flags & AT32_GPIOF_HIGH)
            gpio_writel(gpio, OVRS, mask);
        else
            gpio_writel(gpio, OVRC, mask);
        if (flags & AT32_GPIOF_OPENDRAIN)
            gpio_writel(gpio, ODMERS, mask);
        else
            gpio_writel(gpio, ODMERC, mask);
        gpio_writel(gpio, PUERS, mask);
    } else {
        if (flags & AT32_GPIOF_PULLUP)
            gpio_writel(gpio, PUERS, mask);
        else
            gpio_writel(gpio, PUERC, mask);
        if (flags & AT32_GPIOF_DEGLITCH)
            gpio_writel(gpio, GFERC, mask);
        else
            gpio_writel(gpio, GFERC, mask);
        gpio_writel(gpio, ODERS, mask);
    }
    gpio_writel(gpio, GPERS, mask);
    return;
}

/* Reserve a pin, preventing anyone else from changing its configuration. */
void __init at32_reserve_pin(unsigned int pin)
{
    struct gpio_device *gpio;
    unsigned int pin_index = pin & 0x1f;
    gpio = gpio_pin_to_dev(pin);
    if (unlikely(! gpio)) {
        printk(\"gpio: invalid pin %u\n\", pin);
        goto fail;
    }
    if (unlikely(test_and_set_bit(pin_index, &gpio->pinmux_mask))) {
        printk(\"%s: pin %u is busy\n\", gpio->name, pin_index);
        goto fail;
    }
    return;
}

/* GPIO API */
static int direction_input(struct gpio_chip *chip, unsigned offset)
{
    struct gpio_device *gpio = container_of(chip, struct gpio_device, chip);
    u32 mask = 1 << offset;
    if (!((gpio_readl(gpio, GPER) & mask))
        return -EINVAL;
    gpio_writel(gpio, ODERC, mask);
    return 0;
}

static int gpio_get(struct gpio_chip *chip, unsigned offset)
{
    struct gpio_device *gpio = container_of(chip, struct gpio_device, chip);
    return (gpio_readl(gpio, PVR) >> offset) & 1;
}

static void gpio_set(struct gpio_chip *chip, unsigned offset, int value)
{
    struct gpio_device *gpio = container_of(chip, struct gpio_device, chip);
    u32 mask = 1 << offset;
    if (value)
        gpio_writel(gpio, OVRS, mask);
    else
        gpio_writel(gpio, OVRC, mask);
}

static int direction_output(struct gpio_chip *chip, unsigned offset, int value)
{
    struct gpio_device *gpio = container_of(chip, struct gpio_device, chip);
    u32 mask = 1 << offset;
    if (!((gpio_readl(gpio, GPER) & mask))
        return -EINVAL;
    gpio_set(chip, offset, value);
    gpio_writel(gpio, ODERS, mask);
    return 0;
}

static int gpio_irq_type(unsigned irq, unsigned type)
{
    if (type != IRQ_TYPE_EDGE_BOTH && type != IRQ_TYPE_NONE)
        return -EINVAL;
    return 0;
}

static struct irq_chip gpio_irqchip = {
    .name = "gpio",
    .mask = gpio_irq_mask,
    .unmask = gpio_irq_unmask,
    .set_type = gpio_irq_type,
};

static void gpio_irq_handler(unsigned irq, struct irq_desc *desc)
{
    struct gpio_device *gpio = get_irq_chip_data(irq);
    unsigned gpio_irq;
gpio_irq = (unsigned) get_irq_data(irq);
for (;;) {
  u32 isr;
  struct irq_desc *d;
  /* ack pending GPIO interrupts */
  isr = gpio_readl(gpio, IFR) & gpio_readl(gpio, IER);
  if (!isr)
    break;
  gpio_writel(gpio, IFRC, isr);
  do {
    int i;
    i = ffs(isr) - 1;
    isr &= ~(1 << i);
    d = &irq_desc[i];
    d->handle_irq(i, d);
  } while (isr);
}

static void __init gpio_irq_setup(struct gpio_device *gpio, int irq, int gpio_irq)
{
  unsigned i;
  set_irq_chip_data(irq, gpio);
  set_irq_data(irq, (void*)gpio_irq);
  for (i = 0; i < 32; i++, gpio_irq++) {
    set_irq_chip_data(gpio_irq, gpio);
    set_irq_chip_and_handler(gpio_irq, &gpio_irqchip,
                            handle_simple_irq);
  }
  set_irq_chained_handler(irq, gpio_irq_handler);
}

/*--------------------------------------------------------------------------*/
#include <linux/seq_file.h>

static void gpio_bank_show(struct seq_file *s, struct gpio_chip *chip)
{
  struct gpio_device *gpio = container_of(chip, struct gpio_device, chip);
  u32 oder, ier, pvr, puer, gfer, odmer;
  unsigned i;
  u32 mask;
  char bank;

  oder = gpio_readl(gpio, ODER);
  ier = gpio_readl(gpio, IER);
  pvr = gpio_readl(gpio, PVR);
  puer = gpio_readl(gpio, PUER);
  gfer = gpio_readl(gpio, GFER);
  odmer = gpio_readl(gpio, ODMER);

  bank = 'A' + gpio->pdev->id;

  for (i = 0, mask = 1; i < 32; i++, mask <<= 1) {
    const char *label;
    label = gpiochip_is_requested(chip, i);
    if (!label || (iner & mask))
      label = "[ irq ]";
    if (!label)
      continue;
    seq_printf(s, " gpio-%-3d P%c%-2d (%-12s) %s %s %s ",
               chip->base + i, bank, i,
               label,
               (oder & mask) ? "out" : "in ",
               (pvr & mask) ? "hi" : "lo ",
               (puer & mask) ? " " : "up ");
    }
if (gfer & mask)
  seq_printf(s, " deglitch");
if ((oder & odmer) & mask)
  seq_printf(s, " open-drain");
if (ier & mask)
  seq_printf(s, " irq-%d edge-both",
             gpio_to_irq(chip->base + i));
seq_printf(s, "\n");
}
#else
#define gpio_bank_show NULL
#endif
 /*--------------------------------------------------------------------------*/
static int __init gpio_probe(struct platform_device *pdev)
{
  struct gpio_device *gpio = NULL;
  int irq = platform_get_irq(pdev, 0);
  int gpio_irq_base = GPIO_IRQ_BASE + pdev->id * 32;
  BUG_ON(pdev->id >= MAX_NR_GPIO_DEVICES);
  gpio = & gpio_dev[pdev->id];
  BUG_ON(! gpio->regs);
  gpio->chip.label = gpio->name;
  gpio->chip.base = pdev->id * 32;
  gpio->chip.ngpio = 32;
  gpio->chip.dev = &pdev->dev;
  gpio->chip.owner = THIS_MODULE;
  gpio->chip.direction_input = direction_input;
  gpio->chip.get = gpio_get;
  gpio->chip.direction_output = direction_output;
  gpio->chip.set = gpio_set;
  gpio->chip.dbg_show = gpio_bank_show;
  gpiochip_add(&gpio->chip);
  gpio_irq_setup(gpio, irq, gpio_irq_base);
  platform_set_drvdata(pdev, gpio);
  printk(KERN_DEBUG "%s: base 0x%p, irq %d chains %d..%d
",
          gpio->name, gpio->regs, irq, gpio_irq_base, gpio_irq_base + 31);
  return 0;
}
static struct platform_driver gpio_driver = {
  .probe = gpio_probe,
  .driver = {
    .name = "gpio",
  },
};
static int __init gpio_init(void)
{
  return platform_driver_register(&gpio_driver);
}
postcore_initcall(gpio_init);
void __init at32_init_gpio(struct platform_device *pdev)
{
  struct resource *regs;
  struct gpio_device *gpio;
  if (pdev->id > MAX_NR_GPIO_DEVICES) {
    dev_err(pdev->dev, "only %d GPIO devices supported\n",
            MAX_NR_GPIO_DEVICES);
    return;
  }
  gpio = & gpio_dev[pdev->id];
  snprintf(gpio->name, sizeof(gpio->name), "gpio%04d", pdev->id);
  regs = platform_get_resource(pdev, IORESOURCE_MEM, 0);
  if (!regs) {
    dev_err(pdev->dev, "no mmio resource defined\n");
    return;
  }
  /*
2273 + gpio->clk = clk_get(&pdev->dev, "mck");
2275 + if (IS_ERR(gpio->clk))
2276 + /*
2277 + * This is a fatal error, but if we continue we might
2278 + * be so lucky that we manage to initialize the
2279 + * console and display this message...
2280 + */
2281 + dev_err(&pdev->dev, "no mck clock defined\n");
2282 + else
2283 + clk_enable(gpio->clk);
2285 + gpio->pdev = pdev;
2286 + gpio->regs = ioremap(regs->start, regs->end - regs->start + 1);
2288 + /* start with irqs disabled and acked */
2289 + gpio_writel(gpio, IERC, ~0UL);
2290 + (void) gpio_readl(gpio, IER);
2292 }
2293
diff --git a/arch/avr32/mach-at32uc3a/gpio.h b/arch/avr32/mach-at32uc3a/gpio.h
Index:00000000..5016db9
---/dev/null
+++b/arch/avr32/mach-at32uc3a/gpio.h
@@-0,0+1,77@@
+/*
 * Atmel GPIO Port Multiplexer support
 */
+* Copyright (C) 2004-2006 Atmel Corporation
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License version 2 as
+* published by the Free Software Foundation.
+* /
+*/
+#ifndef __ARCH_AVR32_AT32UC_GPIO_H__
+#define __ARCH_AVR32_AT32UC_GPIO_H__
+/* PIO register offsets */
+#define GPIO_GPER 0x00
+#define GPIO_GPERS 0x04
+#define GPIO_GPERC 0x08
+#define GPIO_GPERT 0x0c
+#define GPIO_PMRO 0x10
+#define GPIO_PMROS 0x14
+#define GPIO_PMROC 0x18
+#define GPIO_PMROT 0x1c
+#define GPIO_PM1 0x20
+#define GPIO_PMRAIS 0x24
+#define GPIO_PM1C 0x28
+#define GPIO_PM1T 0x2c
+#define GPIO_ODER 0x30
+#define GPIO_ODERS 0x34
+#define GPIO_ODERC 0x38
+#define GPIO_ODERT 0x3c
+#define GPIO_OVR 0x40
+#define GPIO_OVRS 0x44
+#define GPIO_OVRC 0x48
+#define GPIO_OVRT 0x4c
+#define GPIO_OVR 0x50
+#define GPIO_OVRS 0x54
+#define GPIO_OVRC 0x58
+#define GPIO_OVRT 0x5c
+#define GPIO_PVR 0x60
+#define GPIO_PUER 0x64
+#define GPIO_ODMRC 0x68
+#define GPIO_ODMRT 0x6c
+#define GPIO_IER 0x70
+#define GPIO_IERS 0x74
+#define GPIO_IERC 0x78
+#define GPIO_IER 0x7c
+#define GPIO_IER 0x80
+#define GPIO_IERS 0x84
+#define GPIO_IERC 0x88
+#define GPIO_IER 0x8c
+#define GPIO_IERS 0x90
+#define GPIO_IER 0x94
+#define GPIO_IERC 0x98
+#define GPIO_IER 0x9c
+#define GPIO_PMRO 0xa0
+#define GPIO_PMROS 0xa4
+#define GPIO_PMROC 0xa8
+#define GPIO_PMROT 0xac
+#define GPIO_PM1 0xb0
+#define GPIO_PM1AIS 0xb4
+#define GPIO_PM1C 0xb8
+#define GPIO_PM1T 0xc0
+#define GPIO_ODER 0xc4
+#define GPIO_ODERS 0xc8
+#define GPIO_ODERC 0xcc
+#define GPIO_IER 0xd0
# define GPIO_IFRC 0xd8

/* Bit manipulation macros */
# define GPIO_BIT(name) (1 << GPIO_##name##_OFFSET)

# define GPIO_BITEXT(name, value) (((value) >> GPIO_##name##_OFFSET) & ((1 << GPIO_##name##_SIZE) - 1))

# define GPIO_BFINES(name, value, old) (((old) & ~(((1 << GPIO_##name##_SIZE) - 1) << GPIO_##name##_OFFSET)) | GPIO_BF(name, value))

/* Register access macros */
# define gpio_readl(port, reg) __raw_readl((port)->regs + GPIO_##reg)
# define gpio_writel(port, reg, value) __raw_writel((value), (port)->regs + GPIO_##reg)

void at32_init_gpio(struct platform_device *pdev);

/* Pin definitions for AT32AP7000. */

/* Copyright (C) 2006 Atmel Corporation */

/* This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 as
 * published by the Free Software Foundation. */

/* Pin numbers identifying specific GPIO pins on the chip. They can
 * also be converted to IRQ numbers by passing them through
 * gpio_to_irq(). */

/* DMAC peripheral hardware handshaking interfaces, used with dw_dmac */
# define DMA_DMAREQ_0 7
# define DMA_DMAREQ_1 8
# define DMA_DMAREQ_2 9
# define DMA_DMAREQ_3 10

+ /* HSB masters */
+ # define HMATRIX_MASTER_CPU_DATA 0
+ # define HMATRIX_MASTER_CPU_INSTRUCTIONS 1
+ # define HMATRIX_MASTER_CPU_SAB 2
+ # define HMATRIX_MASTER_CPU_DPCA 3
+ # define HMATRIX_MASTER_MACB_DMA 4
+ # define HMATRIX_MASTER_USB_DPAM 5
+
+ /* HSB slaves */
+ # define HMATRIX_SLAVE_INT_FLASH 0
+ # define HMATRIX_SLAVE_HSB_PB_BR0 1
+ # define HMATRIX_SLAVE_HSB_PB_BR1 2
+ # define HMATRIX_SLAVE_INT_SRAM 3
+ # define HMATRIX_SLAVE_USB_DPAM 4
+ # define HMATRIX_SLAVE_EBI 5
+
+ #endif /* __ASM_ARCH_AT32UC3A0XXX_H__ */

+++/ a/arch/avr32/mach-at32ap/include/mach/board.h b/arch/avr32/mach-at32uc3a/include/mach/board.h
+ similarity index 93%
+ copy from a/arch/avr32/mach-at32ap/include/mach/board.h
+ copy to a/arch/avr32/mach-at32uc3a/include/mach/board.h
+ index afaf7a...e6e9b07 10064
+ --- a/arch/avr32/mach-at32ap/include/mach/board.h
+ +++ b/arch/avr32/mach-at32uc3a/include/mach/board.h
+ @@ -14,14 +14,8 @@
+ extern unsigned long at32_board_osc_rates[];
+ 241
+ /*
+ * This used to add essential system devices, but this is now done
+ * automatically. Please don’t use it in new board code.
+ */
+ static inline void __deprecated at32_add_system_devices(void)
+ {
+ 248
+ #define ATMEL_MAX_UART 4
+ extern struct platform_device *atmel_default_console_device;
+ #endif /* __ASM_ARCH_AT32UC3A0XXX_H__ */
+diff --git a/arch/avr32/mach-at32ap/include/mach/chip.h b/arch/avr32/mach-at32uc3a/include/mach/chip.h
+ similarity index 87%
+ copy from a/arch/avr32/mach-at32ap/include/mach/chip.h
+ copy to a/arch/avr32/mach-at32uc3a/include/mach/chip.h
+ index 5efca6...b29e181 10064
+ --- a/arch/avr32/mach-at32ap/include/mach/chip.h
+ +++ b/arch/avr32/mach-at32uc3a/include/mach/chip.h
+ @@ -12,6 +12,8 @@
+ struct usba_platform_data;
+ struct platform_device *;
+ diff --git a/arch/avr32/mach-at32ap/include/mach/cpu.h b/arch/avr32/mach-at32uc3a/include/mach/cpu.h
+ similarity index 88%
+ copy from a/arch/avr32/mach-at32ap/include/mach/cpu.h
+ copy to a/arch/avr32/mach-at32uc3a/include/mach/cpu.h
+ index 5efca6...b29e181 10064
+ --- a/arch/avr32/mach-at32ap/include/mach/cpu.h
+ +++ b/arch/avr32/mach-at32uc3a/include/mach/cpu.h
+ @@ -12,6 +12,8 @@
+ #endif --
static int __init pdca_probe (struct platform_device *pdev)

static int __init pdca_probe (struct platform_device *pdev)

#include <linux/platform_device.h>
#include <linux/init.h>

void at32_deselect_pin (unsigned int pin);

unsigned long flags);

void at32_deselect_periph (unsigned int port, unsigned int pin);

void at32_select_periph (unsigned int port, unsigned int pin, unsigned long flags);

void at32_select_gpio (unsigned int pin, unsigned long flags);

void at32_deselect_pin (unsigned int pin);

void at32_reserve_pin (unsigned int pin);

#define AT32_GPIOF_OUTPUT 0x00000002 /* (OUT) Enable output driver */
#define AT32_GPIOF_HIGH 0x00000004 /* (OUT) Set output high */
#define AT32_GPIOF_DEGLITCH 0x00000008 /* (IN) Filter glitches */
#define AT32_GPIOF_MULTIDRV 0x00000010 /* Enable multidriver option */
#define AT32_GPIOF_OPENDRAIN 0x00000010 /* Enable open drain node option */

void at32_select_periph (unsigned int pin, unsigned int periph, unsigned int flags);

struct clk *pclk, *hclk;
D.29  Board support for ATEVK1100

```
commit 6677f489f529d76a17c6ab6900f81dbcfbc8b5d1
Author: Gunnar Rangoy <gunnar@rangoy.com>
Date: Tue May 5 14:23:43 2009 +0200

AVR32: Board support for ATEVK1100

diff --git a/arch/avr32/Kconfig b/arch/avr32/Kconfig
index 631d389..fcec5a1 100644
--- a/arch/avr32/Kconfig
+++ b/arch/avr32/Kconfig
@@ -155,6 +155,10 @@ config BOARD_FAVR32
config BOARD_MIMC200
+bool "MIMC200 CPU board"
+select CPU_AT32AP7000
+endchoice
+config BOARD_ATEVK1100
+bool "ATEVK1100 Evaluation Kit"
+select CPU_AT32UC3A0XXX
+endchoice
source "arch/avr32/boards/atstk1000/Kconfig"
diff --git a/arch/avr32/Makefile b/arch/avr32/Makefile
index ad1dd87..0a8c3eb 100644
--- a/arch/avr32/Makefile
+++ b/arch/avr32/Makefile
@@ -51,6 +51,7 @@ core-$(CONFIG_BOARD_ATSTK1000) += arch/avr32/boards/atstk1000/
core-$(CONFIG_BOARD_FAVR32) += arch/avr32/boards/favr32/ 
core-$(CONFIG_BOARD_MIMC200) += arch/avr32/boards/mimc200/ 
core-$(CONFIG_BOARD_ATEVK1100) += arch/avr32/boards/atevk1100/ 
core-$(CONFIG_LOADERS_U_BOOT) += arch/avr32/boards/u-boot/
core-y += arch/avr32/kernel/
core-y += arch/avr32/mz/
diff --git a/arch/avr32/boards/atevk1100/Makefile b/arch/avr32/boards/atevk1100/Makefile
new file mode 100644
index 0000000..beeeb77
--- /dev/null
+++ b/arch/avr32/boards/atevk1100/Makefile
@@ -0,0 +1 @@
```
diff --git a/arch/avr32/boards/atevk1100/setup.c b/arch/avr32/boards/atevk1100/setup.c
new file mode 100644
index 0000000..4e83a3de
--- /dev/null
+++ b/arch/avr32/boards/atevk1100/setup.c
@@ -0,0 +1,121 @@
+/*
+ * Board - specific setup code for the ATEVK1100 Evaluation Kit
+ *
+ * Copyright (C) 2005 - 2006 Atmel Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#
+include <linux/clk.h>
+include <linux/etherdevice.h>
+include <linux/irq.h>
+include <linux/i2c.h>
+include <linux/i2c-gpio.h>
+include <linux/init.h>
+include <linux/linkage.h>
+include <linux/platform_device.h>
+include <linux/types.h>
+include <linux/leds.h>
+include <linux/spi/spi.h>
+include <asm/atmel-mci.h>
+include <asm/io.h>
+include <asm/setup.h>
+
+/* Oscillator frequencies. These are board - specific */
+unsigned long at32_board_osc_rates[3] = {
+ [0] = 32768 , /* 32.768 kHz on RTC osc */
+ [1] = 12000000 , /* 12 MHz on osc0 */
+ [2] = 0,
+};
+/* Initialized by bootloader - specific startup code. */
+struct tag * bootloader_tags __initdata ;
+
+static struct eth_platform_data __initdata eth_data = {
+ .is_rmii = 1,
+};
+static struct spi_board_info spi0_board_info[] __initdata = {
+ {
+ .modalias = " mtd_dataflash",
+ .max_speed_hz = 8000000,
+ .chip_select = 0,
+ },
+};
+static struct gpio_led evk1100_leds[] = {
+ { .name = "led1", .gpio = GPIO_PIN_PB(27) , .active_low = 1,
+ .default_trigger = "heartbeat",
+ },
+ { .name = "led2", .gpio = GPIO_PIN_PB(28) , .active_low = 1, },
+ /* Disabled, as it sits on the CS2, which is used for SRAM. */
+ { .name = "led3", .gpio = GPIO_PIN_PB(29) , .active_low = 1, },
+ { .name = "led4", .gpio = GPIO_PIN_PB(30) , .active_low = 1, },
+ { .name = "led5r", .gpio = GPIO_PIN_PB(19) , .active_low = 1, },
+ { .name = "led5g", .gpio = GPIO_PIN_PB(20) , .active_low = 1, },
+ { .name = "led6r", .gpio = GPIO_PIN_PB(21) , .active_low = 1, },
+ { .name = "led6g", .gpio = GPIO_PIN_PB(22) , .active_low = 1, },
+};
+static const struct gpio_led_platform_data evk1100_led_data = {
+ .num_leds = ARRAY_SIZE(evk1100_leds),
+ .leds = (void *) evk1100_leds,
+};
+void __init setup_board(void)
+{ at32_map_uart(0, 0); /* USART 0: /dev/ttyS0, DB9 */
+ at32_setup_serial.console(0);
+}
+static const struct gpio_led_platform_data evk1100_led_data = {
+ .num_leds = ARRAY_SIZE(evk1100_leds),
+ .leds = (void *) evk1100_leds,
+};
static struct platform_device evk1100_gpio_leds = {
    .name = "leds-gpio",
    .id = -1,
    .dev = {
        .platform_data = (void*) &evk1100_led_data,
    }
};

static int __init atevk1100_init(void)
{
    unsigned i;

    /*
     * atevk1100 uses 16-bit SDRAM interface, so we don't need to
     * reserve any pins for it.
     */
    at32_add_system_devices();
    at32_add_device_usart(0);
    at32_add_device_eth(0, &eth_data);
    at32_add_device_spi(0, spi0_board_info, ARRAY_SIZE(spi0_board_info));
    at32_add_device_usba(0, NULL);
    for (i = 0; i < ARRAY_SIZE(evk1100_leds); i++) {
        at32_select_gpio(evk1100_leds[i].gpio, AT32_GPIOF_OUTPUT | AT32_GPIOF_HIGH);
        platform_device_register(&evk1100_gpio_leds);
    }
    return 0;
}

/*postcore_initcall(atevk1100_init);*/

static int __init atevk1100_arch_init(void)
{
    /* set_irq_type() after the arch_initcall for EIC has run, and
     * before the I2C subsystem could try using this IRQ.
     */
    return set_irq_type(AT32_EXTINT(3), IRQ_TYPE_EDGE_FALLING);
}

/*arch_initcall(atevk1100_arch_init);*/
Appendix E
PDCA, SPI and DataFlash support

```c
static struct spi_board_info spi0_board_info[] __initdata = {
  /*
   .modalias = "mtd_dataflash",
   .max_speed_hz = 8000000,
   .chip_select = 0,
  */
};

static struct spi_board_info spi1_board_info[] __initdata = {
  /*
   { modalias = "mtd_dataflash",
     max_speed_hz = 8000000,
     chip_select = 0,
   },
  */
};
```

```c
+ at32_add_device_spi(1, spi1_board_info, ARRAY_SIZE(spi1_board_info));
+ at32_add_device_usbz(0, NULL);
```
```c
static struct resource pdca_resource[] = {
    .start = num,
    .end = num,
    .flags = IORESOURCE_PDCA_TX,
};

static struct platform_device pdca_device = {
    .name = "pdca",
    .id = 0,
    .resource = pdca_resource,
    .num_resources = ARRAY_SIZE(pdca_resource),
};

DEV_CLK(pclk, pdca, pba, 2);
+ DEV_CLK(hclk, pdca, hsb, 5);

static struct resource atmel_spi0_resource[] = {
    PBMEM(0xffff2400),
    IRQ(9),
    PDCA_RX(7),
    PDCA_TX(15),
};
+ DEFINE_DEV(atmel_spi0, 0);
+ DEV_CLK(spi_clk, atmel_spi0, pba, 5);
+ &atmel_spi0_resource[] = {
    PBMEM(0xffff2300),
    IRQ(10),
    PDCA_RX(8),
    PDCA_TX(16),
};

static struct resource atmel_spi1_resource[] = {
    PBMEM(0xffff2800),
    IRQ(10),
    PDCA_RX(8),
    PDCA_TX(16),
};

```
gpio_writel(gpio, PUERC, mask);

if (flags & AT32_GPIOF_DEGLITCH)
    gpio_writel(gpio, GFERS, mask);
else
    if (value)
        gpio_writel(gpio, OVRS, mask);
    else
        void *a;
        u32 v;
        a = gpio->regs + GPIO_OVRC;
        v = mask;
        __raw_writel(v, a);
    }

static void gpio_set(struct gpio_chip *chip, unsigned offset, int value) {
    if (value)
        gpio_writel(gpio, OVRS, mask);
    else {
        void *a;
        u32 v;
        a = gpio->regs + GPIO_OVRC;
        v = mask;
        __raw_writel(v, a);
    }
}

static int direction_output(struct gpio_chip *chip, unsigned offset, int value) {
    gpio->pdev = pdev;
    gpio->regs = ioremap(regs->start, regs->end - regs->start + 1);
    if (!gpio->regs)
        dev_err(&pdev->dev, "unable to map memory (%p, %u)\n", (void *)regs->start, regs->end -
        regs->start + 1);
    /* start with irqs disabled and acked */
    gpio_writel(gpio, IERC, ~0UL);
}

/* Bits in CR */
#define PDCA_CR_TEN 0x00000001
#define PDCA_CR_TDIS 0x00000002
#define PDCA_CR_ECLR 0x00000100

/* Bits in SR */
#define PDCA_SR_TEN 0x00000001
#define PDCA_SR_TDIS 0x00000002
#define PDCA_SR_ECLR 0x00000100

/* Peripheral DMA abstraction layer for UC3A.*/

#include <linux/io.h>

#define PDCA_MAR 0x00
#define PDCA_PSR 0x04
#define PDCA_TCR 0x08
#define PDCA_MARR 0x0c
#define PDCA_TCRR 0x10
#define PDCA_CR 0x14
#define PDCA_MR 0x18
#define PDCA_SR 0x1c
#define PDCA_SLOT_SIZE 0x40

/* Bits in CR */
#define PDCA_CR_TEN 0x00000001
#define PDCA_CR_TDIS 0x00000002
#define PDCA_CR_ECLR 0x00000100

/* Bits in SR */
#define PDCA_SR_TEN 0x00000001

/* Peripheral DMA abstraction layer for UC3A.*/

#include <linux/io.h>

#define PDCA_MAR 0x00
#define PDCA_PSR 0x04
#define PDCA_TCR 0x08
#define PDCA_MARR 0x0c
#define PDCA_TCRR 0x10
#define PDCA_CR 0x14
#define PDCA_MR 0x18
#define PDCA_SR 0x1c
#define PDCA_SLOT_SIZE 0x40

/* Bits in CR */
#define PDCA_CR_TEN 0x00000001
#define PDCA_CR_TDIS 0x00000002
#define PDCA_CR_ECLR 0x00000100

/* Bits in SR */
#define PDCA_SR_TEN 0x00000001

/* Peripheral DMA abstraction layer for UC3A.*/
APPENDIX E. PDCA, SPI AND DATAFLASH SUPPORT

/* Resource types for PDCA RX peripheral ID and PDCA TX peripheral id. */

#define IORESOURCE_PDCA_RX 0x000000e0
#define IORESOURCE_PDCA_TX 0x000000f0

+ struct pdma_channel {
+ int rx_slot;
+ int tx_slot;
+};

+ int pdma_init(struct pdma_channel *channel, struct platform_device *pdev);
+ void pmum_release(struct pdma_channel *channel);
+ void pmum_set_rx(struct pdma_channel *channel, dma_addr_t addr, u32 counter);
+ void pmum_set_next_rx(struct pdma_channel *channel, dma_addr_t addr, u32 counter);
+ void pmum_set_tx(struct pdma_channel *channel, dma_addr_t addr, u32 counter);
+ void pmum_set_next_tx(struct pdma_channel *channel, dma_addr_t addr, u32 counter);
+ void pmum_enable_rx(struct pdma_channel *channel);
+ void pmum_disable_rx(struct pdma_channel *channel);
+ void pmum_enable_tx(struct pdma_channel *channel);
+ void pmum_disable_tx(struct pdma_channel *channel);
+ int pmum_tx_enabled(struct pdma_channel *channel);
+ u32 pmum_get_rx_counter(struct pdma_channel *channel);
+ u32 pmum_get_tx_counter(struct pdma_channel *channel);

+ void __iomem *pmum_regs;
+ static DEFINE_MUTEX(pdma_lock);
+ static unsigned long allocated_slots;

# define PDCA_SLOTS 15

+ static int allocate_slot(void)
+ {
+ int slot;
+ mutex_lock(&pmum_lock);
+ slot = ffs(allocated_slots);
+ if (slot >= PDCA_SLOTS) {
+ slot = -ENOSPC;
+ printk(KERN_ERR "No free pdca slots.\n");
+ } else {
+ __set_bit(slot, &allocated_slots);
+ }
+ mutex_unlock(&pmum_lock);
+ return slot;
+ }

# define __ASM_ARCH_PDMA_H__
#define __ASM_ARCH_PDMA_H__
#include <linux/ioport.h>
#include <linux/clk.h>
#include <linux/init.h>
#include <linux/platform_device.h>
#include <mach/pdma.h>

void __iomem *pmum_regs;
static DEFINE_MUTEX(pdma_lock);
static unsigned long allocated_slots;
#define PDCA_SLOTS 15

static int allocate_slot(void)
{
int slot;
mutex_lock(&pmum_lock);
slot = ffs(allocated_slots);
if (slot >= PDCA_SLOTS) {
slot = -ENOSPC;
printk(KERN_ERR "No free pdca slots.\n");
} else {
__set_bit(slot, &allocated_slots);
}
mutex_unlock(&pmum_lock);
return slot;
}

static void free_slot(int slot)
BUG_ON(slot < 0 || slot >= PDCA_SLOTS);
mutex_lock(&pdca_lock);
__clear_bit(slot, &allocated_slots);
mutex_unlock(&pdca_lock);

int pdma_init(struct pdma_channel *channel, struct platform_device *pdev)
{
    int ret;
    struct resource *pdca_rx;
    struct resource *pdca_tx;
    pdca_rx = platform_get_resource(pdev, IORESOURCE_PDCA_RX, 0);
pdma_set_slot(channel, dma_addr_t addr, u32 counter)
{
    BUG_ON(channel->rx_slot != -1);
    BUG_ON(counter > 0xff);
pdma_set_next_slot(channel, dma_addr_t addr, u32 counter)
{
    BUG_ON(channel->rx_slot != -1);
pdma_set_next_slot(channel, dma_addr_t addr, u32 counter)
{
    BUG_ON(channel->tx_slot != -1);
    BUG_ON(counter > 0xff);
pdma_set_next_slot(channel, dma_addr_t addr, u32 counter)
{
    BUG_ON(channel->tx_slot != -1);
```c
+ BUG_ON(counter > 0xffff);
+ pdca_writel(channel->tx_slot, MAR, addr);
+ pdca_writel(channel->tx_slot, TCR, counter);
+
+}
+
+ void pdma_set_next_tx(struct pdma_channel *channel, dma_addr_t addr, u32 counter)
+ {
+   BUG_ON(channel->tx_slot == -1);
+   BUG_ON(counter > 0xffff);
+   pdca_writel(channel->tx_slot, MAR, addr);
+   pdca_writel(channel->tx_slot, TCR, counter);
+ }
+
+ void pdma_set_next_rx(struct pdma_channel *channel, dma_addr_t addr, u32 counter)
+ {
+   BUG_ON(channel->tx_slot == -1);
+   BUG_ON(counter > 0xffff);
+   pdca_writel(channel->tx_slot, MAR, addr);
+   pdca_writel(channel->tx_slot, TCR, counter);
+ }
+
+}
+
+ void pdma_enable_rx(struct pdma_channel *channel)
+ {
+   BUG_ON(channel->rx_slot == -1);
+   pdca_writel(channel->rx_slot, CR, PDCA_CR_TEN);
+ }
+
+ void pdma_disable_rx(struct pdma_channel *channel)
+ {
+   BUG_ON(channel->rx_slot == -1);
+   pdca_writel(channel->rx_slot, CR, PDCA_CR_TDIS);
+ }
+
+ int pdma_rx_enabled(struct pdma_channel *channel)
+ {
+   BUG_ON(channel->rx_slot == -1);
+   return !(pdca_readl(channel->rx_slot, SR) & PDCA_SR_TEN);
+ }
+
+ void pdma_enable_tx(struct pdma_channel *channel)
+ {
+   BUG_ON(channel->tx_slot == -1);
+   pdca_writel(channel->tx_slot, CR, PDCA_CR_TEN);
+ }
+
+ void pdma_disable_tx(struct pdma_channel *channel)
+ {
+   BUG_ON(channel->tx_slot == -1);
+   pdca_writel(channel->tx_slot, CR, PDCA_CR_TDIS);
+ }
+
+ int pdma_tx_enabled(struct pdma_channel *channel)
+ {
+   BUG_ON(channel->tx_slot == -1);
+   return !(pdca_readl(channel->tx_slot, SR) & PDCA_SR_TEN);
+ }
+
+ u32 pdma_get_rx_counter(struct pdma_channel *channel)
+ {
+   BUG_ON(channel->rx_slot == -1);
+   return pdca_readl(channel->rx_slot, TCR);
+ }
+
+ u32 pdma_get_tx_counter(struct pdma_channel *channel)
+ {
+   BUG_ON(channel->tx_slot == -1);
+   return pdca_readl(channel->tx_slot, TCR);
+ }
+
+ static int __init pdca_probe(struct platform_device *pdev)
+ {
+   struct resource *regs;
+   struct clk *pclk, *hclk;
+   regs = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   if (!regs) {
+     dev_err(&pdev->dev, "no memory defined\n");
+     return -ENOMEM;
+   }
+   pclk = clk_get(&pdev->dev, "pclk");
+   if (IS_ERR(pclk)) {
+     dev_err(&pdev->dev, "no pclk defined\n");
+     return -EINVAL;
+   }
+   dev_err(&pdev->dev->dev, "no pclk defined\n");
+   static int __init pdca_probe(struct platform_device *pdev)
+   return PTR_ERR(hclk);
+ }
+```
```c
index 6dd9aff..653eae2 100644
--- a/drivers/mtd/devices/mtd_dataflash.c
+++ b/drivers/mtd/devices/mtd_dataflash.c
@@ -867,6 +867,7 @@ static int __devinit dataflash_probe(struct spi_device *spi)
    * capacity using bits in the status byte.
    */
    status = dataflash_status(spi);
+   if (status <= 0 || status == 0xff) {
    DEBUG(MTD_DEBUG_LEVEL1, "%s: status error %d\n",
        spi->dev.bus_id, status);
```
if (msg->spi->bits_per_word > 8)
  len >>= 1;
spi_writel(as, RNCR, len);
spi_writel(as, THCR, len);
pdma_set_next_rx(as->dma, rx_dma, len);
pdma_set_next_tx(as->dma, tx_dma, len);
dev_dbg(&msg->spi->dev, "next xfer %p: len %u tx %p/0x%x rx %p/0x%x\n",
  xfer->rx_buf, xfer->rx_dma);
ieval = SPI_BIT(ENDRX) | SPI_BIT(OVRES);

} else {
  spi_writel(as, RNCR, 0);
  spi_writel(as, THCR, 0);
pdma_set_next_rx(as->dma, NULL, 0);
pdma_set_next_tx(as->dma, NULL, 0);
ieval = SPI_BIT(RXBUFF) | SPI_BIT(ENDRX) | SPI_BIT(OVRES);
}

static void atmel_spi_next_message(struct spi_master *master)
  * It should be doable, though. Just not now...
  */
  sp_writel(as, IER, ieval);
  spi_writel(as, PTCR, SPI_BIT(TXTEN) | SPI_BIT(RXTEN));
pdma_enable_rx(as->dma);
pdma_enable_tx(as->dma);
}

static void atmel_spi_next_xfer(struct spi_master *master, struct spi_message *msg)
  * continue if needed */
  if (list_empty(&as->queue) || as->stopping)
  pdma_disable_rx(as->dma);
pdma_disable_tx(as->dma);
else
  atmel_spi_next_message(master);
}

static void atmel_spi_interrupt(int irq, void *dev_id)
int ret = IRQ_NONE;
spin_lock(&as->lock);
xfer = as->current_transfer;
msg = list_entry(as->queue.next, struct spi_message, queue);

if (pending & SPI_BIT(OVRES)) {
  int timeout;
  /* Continue if needed */
  if (list_empty(&as->queue)) || as->stopping){
    pdma_disable_rx(as->dma);
    pdma_disable_tx(as->dma);
  }
  else
    atmel_spi_next_message(master);
}

pp -390.7 +397.6 atmel_spi_interrupt(int irq, void *dev_id)
int ret = IRQ_NONE;
spin_lock(&as->lock);
xfer = as->current_transfer;
msg = list_entry(as->queue.next, struct spi_message, queue);

if (pending & SPI_BIT(OVRES)) {
  int timeout;
  ret = IRQ_HANDED;
  sp_writel(as, IDR, (SPI_BIT(RXBUFF) | SPI_BIT(ENDRX))
  * First, stop the transfer and unmap the DMA buffers.
  */
  sp_writel(as, PTCR, SPI_BIT(RXTDIS) | SPI_BIT(TXTDIS));
pdma_disable_rx(as->dma);
pdma_disable_tx(as->dma);
if (msg->is_dma_mapped)
  atmel_spi_dma_unmap_xfer(master, xfer);
}

pp -426.16 +433.17 atmel_spi_interrupt(int irq, void *dev_id)
udelay(xfer->delay_usecs);
dev_warn(master->dev.parent, "overrun (%u/%u remaining)\n",
  spi_readl(as, TCR), spi_readl(as, RCR));
pdma_get_tx_counter(as->dma), pdma_get_rx_counter(as->dma));

*/
  * Clean up DMA registers and make sure the data
  * registers are empty.
/*
 - spi_writel(as, RNCR, 0);
 - spi_writel(as, TNCR, 0);
 - spi_writel(as, RCR, 0);
 - spi_writel(as, TCR, 0);
 + pdma_set_next_rx(&as->dma, NULL, 0);
 + pdma_set_next_tx(&as->dma, NULL, 0);
 + pdma_set_rx(&as->dma, NULL, 0);
 + pdma_set_tx(&as->dma, NULL, 0);
 +
 + for (timeout = 1000; timeout; timeout--)
 +   if (spi_readl(as, SR) & SPI_BIT(TXEMPTY))
 +     goto out_unmap_regs;
 + ret = pdma_init(&as->dma, pdev);
 + if (ret)
 +   goto out_free_irq;
 +
 + /* Initialize the hardware */
 + clk_enable(clk);
 + spi_writel(as, CR, SPI_BIT(SWRST));
 + spi_writel(as, CR, SPI_BIT(SWRST)); /* AT91SAM9263 Rev B workaround */
 + spi_writel(as, MR, SPI_BIT(MSTR) | SPI_BIT(MODFDIS));
 + spi_writel(as, PTCR, SPI_BIT(RXTDIS) | SPI_BIT(TXTDIS));
 + pdma_disable_rx(&as->dma);
 + pdma_disable_tx(&as->dma);
 +
 + spi_writel(as, CR, SPI_BIT(SPIEN));
 +
 + /* go! */
 +
 + pdma_release(&as->dma);
 +
 + free_irq(irq, master);
 +
 + out_unmap_regs;
 +
 + out_reset_hw:
 +
 + pdma_release(&as->dma);
 +
 + clk_disable(clk);
 +
 + free_irq(as->irq, master);
 +
 + dma_free_coherent(&pdev->dev, BUFFER_SIZE, as->buffer,
 +   as->buffer_dma);
 +
 + pdma_release(&as->dma);
 +
 + clk_put(as->clk);
 +
 + free_irq(as->irq, master);
 +
 + index 6e06b6a..95dbc0c 100644
 +--- a/drivers/spi/atmel_spi.h
 +--- b/drivers/spi/atmel_spi.h
 +diff --git a/drivers/spi/atmel_spi.h b/drivers/spi/atmel_spi.h
 +index 6e06b6a..95dbc0c 100644
 +--- a/drivers/spi/atmel_spi.h
 ++++ b/drivers/spi/atmel_spi.h
 +@@ -23,16 +23,6 @@
 +
 + /* Bitfields in CR */
 +
 + #define SPI_CSR1 0x0034
 + #define SPI_CSR2 0x0038
 + #define SPI_CSR3 0x003c
 +
 + #define SPI_RPR
 + 0x0100
 +
 + #define SPI_RCR
 + 0x0104
 +
 + #define SPI_TPR
 + 0x0108
 +
 + #define SPI_TCR
 + 0x010c
 +
 + #define SPI_RNPR
 + 0x0110
 +
 + #define SPI_RNCR
 + 0x0114
 +
 + #define SPI_RTPR
 + 0x0118
 +
 + #define SPI_RTPR
 + 0x011c
 +
 + #define SPI_TNPR
 + 0x0120
 +
 + #define SPI_TNCR
 + 0x0124
 +
 + /* Bitfields in CR */
 +
 + #define SPI_SPEN_OFFSET 0
Appendix F

Toolchain patches

F.1 Coverletter

From 4912c9e615f5c2f6e55838e4895004b3149f08f8 Mon Sep 17 00:00:00 2001
Date: Tue, 26 May 2009 16:21:02 +0200
Subject: [PATCH] Toolchain support for AVR32A UC3 Linux programs

These patches make it possible to compile Linux programs for AVR32A UC3.
A lot of work still remains, but they actually work.
We are able to use the toolchain to compile BusyBox.

What works:
* Compiling statically linked FDPIC ELF programs.

What should be done / What does not work:
* Shared library support
* Some cleanup
* Linking in some program (e.g. BusyBox) does not work entirely correct.
  * The PT_GNU_STACKSIZE not always copied.
  * Probably other bugs in the code

This patches are developed during a master thesis at NTNU. We hope that
someone else can use them as a starting point for getting full support
for FDPIC ELF into the avr32 toolchain.

We made changes to the following tools:
* GCC-4.2.2-atmel.1.1.3
* Binutils-2.18 with patches from buildroot-avr32-v2.3.0
* uClibc-0.9.30

We attached the script we used to build the toolchain, to show which
options we used to compile the various tools. We have also attached
the script we used to build BusyBox.

F.2 GCC changes

From 70e21bc96a9d938d86ae366b930b37dd784f364d Mon Sep 17 00:00:00 2001
Date: Tue, 26 May 2009 15:08:26 +0200
Subject: [PATCH] GCC: Add support for FDPIC ELF for AVR32.

This patch makes a few changes to GCC, mostly to add support for the
-mfdpic flag. There were also a few changes to crt1.asm, to prevent it
from replacing the got-pointer during _init and _fini.

Unfortunately, we haven't found a way to compile several variants of
crti.o from crt1.asm, so that a single GCC can be used for both fdpic
and normal compiles.

To compile gcc for fdpic, make must be invoked like:
make CFLAGS_FOR_TARGET=-mfdpic

This makes crt1.asm compile with __AVR32_FDPIC__ defined.
---

gcc/config/avr32/avrm32.opt | 3 +++
gcc/config/avr32/crti.asm | 4 +++
gcc/config/avr32/linux-elf.h | 15 ++++++++++++++
3 files changed, 21 insertions (+), 1 deletions (-)

diff --git a/gcc/config/avr32/avrm32.opt b/gcc/config/avr32/avrm32.opt
index a9a1d5a..d4c62f3 100644
-- a/gcc/config/avr32/avrm32.opt
+++ b/gcc/config/avr32/avrm32.opt
@@ -84,3+84,6 @@ Target Report Mask (RMW_ADDRESSABLE_DATA)
Signal that all data is in range for the Atomic Read-Modify-Write memory instructions, and that
gcc can safely generate these whenever possible.

+- mfdpic
+- Target Report Mask (FDPIC)
++ Enable Function Descriptor PIC mode

diff --git a/gcc/config/avr32/crti.asm b/gcc/config/avr32/crti.asm
index 4c31f49..634adc3 100644
-- a/gcc/config/avr32/crti.asm
+++ b/gcc/config/avr32/crti.asm
@@ -40,6+40,7 @@ _ init:
# ifndef __AVR32_FDPIC__
lddpc r6, 1f
0:
@@ -47,6+48,7 @@ _ init:
.section "._ fini"
/* Just load the GDT */
@@ -54,6+56,7 @@ _ init:
# ifndef __AVR32_FDPIC__
lddpc r6, 1f
0:
@@ -61,4+64,5 @@ _ fini:
# undef ASM_SPEC
# define ASM_SPEC " %{mfpic:!:fpic:%{fpic:!:fPIC:%{fPIC:!:fno-PIC:%{fno-PIC:!:fno-PIC:-fno-PIC}}}}\%
+
# undef LINK_SPEC
# define LINK_SPEC "%{version:-v}"
+ %{mfpic:!:avr32linurdfpic} 
+ %{static:-Bstatic} 
+ %{shared:-Bshared} 
+ %{symbolic:-Bsymbolic} 
@@ -122,6+133,8 @@
builtin_define("-_AVR32_HAS_BRANCH_PRED__");
if (TARGET_FLOAT)
builtin_define("-_AVR32_FAST_FLOAT__");
### F.3 GNU binutils changes

From 4912c96e1f5f2c2f6ee5583e4895004b3149f08f8 Mon Sep 17 00:00:00 2001
Date: Tue, 26 May 2009 16:21:02 +0200
Subject: [PATCH] Binutils support for FDPIC ELF on AVR32 UC3

This patch adds support for statically linked FDPIC ELF targets on AVR32. It mostly works, but there is a lack of error checking on input file types, which means that if the linker is invoked incorrectly, it will fail in strange ways.

For example, if one fails to specify -I elf32-avr32fdpic to strip/objcopy, it will pretend that the file is a normal elf32-avr32 file, and "ruin" the PT_GNU_STACK program header.

Some functions are (almost) direct copies from elf32-bfin.c and elf32-frv.c, which are two architectures with FDPIC support. The code for creating the .rofixup-section is however mostly new.

---

<table>
<thead>
<tr>
<th>File</th>
<th>Additions</th>
<th>Deletions</th>
</tr>
</thead>
<tbody>
<tr>
<td>bfd/config.bfd</td>
<td>2</td>
<td></td>
</tr>
<tr>
<td>bfd/configure</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>bfd/configure.in</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>bfd/elf32-avr32.c</td>
<td>343</td>
<td></td>
</tr>
<tr>
<td>bfd/targets.c</td>
<td>2</td>
<td></td>
</tr>
<tr>
<td>gas/config/tc-avr32.c</td>
<td>8</td>
<td></td>
</tr>
<tr>
<td>include/elf/avr32.h</td>
<td>4</td>
<td></td>
</tr>
<tr>
<td>ld/Makefile.am</td>
<td>5</td>
<td></td>
</tr>
<tr>
<td>ld/Makefile.in</td>
<td>5</td>
<td></td>
</tr>
<tr>
<td>ld/configure.tgt</td>
<td>4</td>
<td></td>
</tr>
<tr>
<td>ld/elems/params/avr32linux.sh</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>ld/elems/params/avr32linuxfdpic.sh</td>
<td>10</td>
<td></td>
</tr>
<tr>
<td>ld/Makefile.in</td>
<td>5</td>
<td></td>
</tr>
<tr>
<td>ld/configure.tgt</td>
<td>4</td>
<td></td>
</tr>
<tr>
<td>12 files changed, 382 insertions(+), 1 deletions(-)</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

create mode 100644 ld/elems/params/avr32linuxfdpic.sh

diff --git a/bfd/config.bfd b/bfd/config.bfd
index 90350d7..193fd54 100644
--- a/bfd/config.bfd
+++ b/bfd/config.bfd
@@ -337,6 +337,8 @@ case "$targ" in
  avr32-*
 targ_defvec=bfd_elf32_avr32_vec
 +    targ_selvecs=bfd_elf32_avr32fdpic_vec
+    targ underscore=yes
  ;;
@@ -509,7 +509,8 @@ do
  bfd_elf32_am33lin_vec) tb="$tb elf32-am33lin.lo elf32.lo $elf" ;;
  bfd_elf32_avr32_vec) tb="$tb elf32-avr32.lo elf32.lo $elf" ;;
  bfd_elf32_avr32fmdp_vec) tb="$tb elf32-avr32fmdp.lo elf32.lo $elf" ;;
+ bfd_elf32_avr32fdpic_vec) tb="$tb elf32-avr32fdpic.lo elf32.lo $elf" ;;
+ bfd_elf32_bfinvec) tb="$tb elf32-bfin.lo elf32.lo $elf" ;;
  bfd_elf32_bfinfdpic_vec) tb="$tb elf32-bfinfdpic.lo elf32.lo $elf" ;;
  bfd_elf32_big_generic_vec) tb="$tb elf32-gen.lo elf32.lo $elf" ;;

diff --git a/bfd/configure b/bfd/configure
index 92d10ba..90a52e7 100755
--- a/bfd/configure
+++ b/bfd/configure
@@ -19042,6 +19042,7 @@ do
  bfd_elf32_am33lin_vec) tb="$tb elf32-am33lin.lo elf32.lo $elf" ;;
  bfd_elf32_avr32_vec) tb="$tb elf32-avr32.lo elf32.lo $elf" ;;
  bfd_elf32_avr32fmdp_vec) tb="$tb elf32-avr32fmdp.lo elf32.lo $elf" ;;
+ bfd_elf32_avr32fdpic_vec) tb="$tb elf32-avr32fdpic.lo elf32.lo $elf" ;;
+ bfd_elf32_bfinvec) tb="$tb elf32-bfin.lo elf32.lo $elf" ;;
  bfd_elf32_bfinfdpic_vec) tb="$tb elf32-bfinfdpic.lo elf32.lo $elf" ;;
  bfd_elf32_big_generic_vec) tb="$tb elf32-gen.lo elf32.lo $elf" ;;
diff --git a/bfd/elf32-avr32.c b/bfd/elf32-avr32.c
index e45134c..f882331 100644
--- a/bfd/elf32-avr32.c
+++ b/bfd/elf32-avr32.c
@@ -59,6 +59,8 @@
     /* The name of the dynamic interpreter. This is put in the .interp section. */
 #define ELF_DYNAMIC_INTERPRETER         
     "lib/ld.so.1"
@@ -68,6 +70,9 @@
     #define NOP_OPCODE 0xd703

 #define DEFAULT_STACK_SIZE 0x10000
+     #define AVR32_GOT_HEADER_SIZE 8
+     #define AVR32_FUNCTION_STUB_SIZE 8

 /* Mapping between BFD relocations and ELF relocations */
@@ -327,6 +332,10 @@
     asection *sgot;
     asection *srelgot;
     asection *sstub;
+    asection *rofixup;
+    unsigned int rofixup_count;
+    unsigned int rofixup_added;
+    asection *rofixup;
     /* We use a variation of Pigeonhole Sort to sort the GOT. After the
      * initial refcounts have been determined, we initialize
@@ -547,6 +556,39 @@
     if (!IS_FDPIC(abfd)) {
         return TRUE;
     }
+    htab = avr32_elf_hash_table(info);
+    if (htab->rofixup) {
+        /* Already created. */
+        return TRUE;
+    }
+    dynobj = elf_hash_table(info)->dynobj;
+    if (info->relocatable) {
+        if (!avr32_rofixup_create(abfd, info)) {
+            return FALSE;
+        }
+        dynobj = elf_hash_table(info)->dynobj;
+        symtab_hdr = &elf_tdata(abfd)->symtab_hdr;
+    }
+    return TRUE;
+}
sym_hashes = elf_sym_hashes(abfd);

@@ -577,6 +623,11 @@ avr32_check_relocs (bfd *abfd, struct bfd_link_info *info, asection *sec, 
    local_gotents = elf_local_gotents(abfd);
    got = htab->got;

+if (IS_FDPIC(abfd) && dynobj == NULL) {
+    elf_hash_table(info)->dynobj = dynobj = abfd;
+}
+
+rel_end = relocs + sec->reloc_count;
+for (rel = relocs; rel < rel_end; rel++)
+
@@ -727,6 +778,21 @@ avr32_check_relocs (bfd *abfd, struct bfd_link_info *info, asection *sec, 
            }
        }
    }

+if (IS_FDPIC(abfd) && ! info->shared && (sec->flags & SEC_ALLOC))
+
+    htab->rofixup_count ++;
+    if (h != NULL)
+        pr_debug("Non-GOT reference to symbol %s\n", 
+            h->root.root.root.string);
+    else
+        pr_debug("Non-GOT reference to local symbol %lu\n", 
+            r_symndx);
+    
+break;

/* TODO : GNU_VTINHERIT and GNU_VTENTRY */

@@ -1265,6 +1331,23 @@ avr32_elf_size_dynamic_sections (bfd *output_bfd, 
    }

+#undef add_dynamic_entry

+if (IS_FDPIC(output_bfd)) {
+    /* Time to find the size of the .rofixup-section. */
+    /* Terminator element. */
+    htab->rofixup->size = 4;
+    
+    /* We need one entry for each R_AVR32_32 reloc. */
+    htab->rofixup->size += 4 * htab->rofixup_count;
+    
+    /* We also need one entry for each got entry. */
+    htab->rofixup->size += htab->sgot->size;
+    
+    htab->rofixup->contents = (bfd_byte *) bfd_zalloc(dynobj, htab->rofixup->size);
+    if (htab->rofixup->contents == NULL)
+        return FALSE;
+    }
   
    return TRUE;

@@ -3234,6 +3317,110 @@ avr32_final_link_relocate(reloc_howto_type *howto, 
     return status;
 }

+static void
+avr32_rofixup_add_entry(bfd *output_bfd, struct bfd_link_info *info, 
+    asection *section, bfd_vma section_offset)
+{
+    struct elf_avr32_link_hash_table *htab;
+    bfd_vma offset;
+    bfd_vma rofixup_entry_offset;
+    
+    htab = avr32_elf_hash_table(info);
+    
+    bfd ASSERT(htab->rofixup);
+    BFD_ASSERT(htab->rofixup->contents);
+    
+    /* Calculate the offset in the output VMA. */
+    offset = section_offset + section->vma + section->output_offset;
+    
+    /* Add that offset to the .rofixup-section. */
+    rofixup_entry_offset = htab->rofixup_added * 4;
+    
+    bfd ASSERT((rofixup_entry_offset < htab->rofixup->size));
+    
+    bfd_put_32(output_bfd, offset, htab->rofixup->contents + rofixup_entry_offset);
```c
+ static void
+ avr32_rofixup_add_relocation(bfd *output_bfd, struct bfd_link_info *info,
+   asection *input_section,
+   Elf_Internal_Rela *reloc)
+ {
+   struct elf_avr32_link_hash_table *htab;
+   bfd_vma offset;
+   htab = avr32_elf_hash_table(info);
+   if (!IS_FDPIC(output_bfd))
+     return;
+   if (!(input_section->flags & SEC_ALLOC))
+     return;
+   /* Find the offset of the symbol in the output file. */
+   offset = _bfd_elf_section_offset(output_bfd, info,
+     input_section,
+     reloc->r_offset);
+   if (offset == (bfd_vma) -1)
+     return;
+   if (offset == (bfd_vma) -2)
+     return;
+   if (input_section->flags & SEC_CODE)
+     {
+     /* This should only occur for three symbols: __GLOBAL_OFFSET_TABLE__,
+       __ROFIXUP_LIST__ and __ROFIXUP_END__. */
+     pr_debug("Skipping relocation for text segment (vma %08lx).\n", offset);
+     return;
+   } /* end case SEC_CODE */
+   if (htab->sgot)
+   {
+     for (offset = 0; offset < htab->sgot->size; offset += 4) {
+       avr32_rofixup_add_entry(output_bfd, info, htab->sgot, offset);
+     }
+   }
+   static void
+   avr32_rofixup_add_got(bfd *output_bfd, struct bfd_link_info *info)
+   {
+     struct elf_avr32_link_hash_table *htab;
+     bfd_vma offset;
+     htab = avr32_elf_hash_table(info);
+     if (!IS_FDPIC(output_bfd))
+       return;
+     if (htab->sgot)
+       return;
+     if (!(htab->sgot->flags & SEC_ALLOC))
+       return;
+     for (offset = 0; offset < htab->sgot->size; offset += 4) {
+       avr32_rofixup_add_entry(output_bfd, info, htab->sgot, offset);
+     }
+   }
+   static void
+   avr32_rofixup_terminate(bfd *output_bfd, struct bfd_link_info *info)
+   {
+     struct elf_avr32_link_hash_table *htab;
+     bfd_vma rofixup_entry_offset;
+     htab = avr32_elf_hash_table(info);
+     BFD_ASSERT(htab->rofixup);
+     BFD_ASSERT(htab->rofixup->contents);
+     rofixup_entry_offset = htab->rofixup+4;
+     rofixup_entry_offset = htab->rofixup+4;
+     bfd_put32(output_bfd, 0xffffffff, htab->rofixup->contents + rofixup_entry_offset);
+     pr_debug("Added rofixup terminator.\n");
+     + htab->rofixup+4;}
+    /* (6) Apply relocations to the normal (non-dynamic) sections */
+    static void
+    avr32_rofixup_add_entry(bfd *output_bfd, struct bfd_link_info *info,
+      asection *input_section, bfd_vma offset)
+    {
+      htab = avr32_elf_hash_table(info);
+      if (!IS_FDPIC(output_bfd))
+        return;
+      if (htab->rofixup)
+        return;
+      if (!(htab->rofixup->flags & SEC_ALLOC))
+        return;
+      /* Find the offset of the symbol in the output file. */
+      offset = _bfd_elf_section_offset(output_bfd, info,
+        input_section,
+        reloc->r_offset);
+      if (offset == (bfd_vma) -1)
+        return;
+      if (offset == (bfd_vma) -2)
+        return;
+      if (input_section->flags & SEC_CODE)
+        {
+        /* This should only occur for three symbols: __GLOBAL_OFFSET_TABLE__,
+          __ROFIXUP_LIST__ and __ROFIXUP_END__. */
+        pr_debug("Skipping relocation for text segment (vma %08lx).\n", offset);
+        return;
+      } /* end case SEC_CODE */
+      if (htab->sgot)
+      {
+        for (offset = 0; offset < htab->sgot->size; offset += 4) {
+          avr32_rofixup_add_entry(output_bfd, info, htab->sgot, offset);
+        }
+      }
+      + static void
+      + avr32_rofixup_add_relocation(bfd *output_bfd, struct bfd_link_info *info,
+        asection *input_section,
+        Elf_Internal_Rela *reloc)
+      + {
+        struct elf_avr32_link_hash_table *htab;
+        bfd_vma offset;
+        htab = avr32_elf_hash_table(info);
+        if (!IS_FDPIC(output_bfd))
+          return;
+        if (!(input_section->flags & SEC_ALLOC))
+          return;
+        /* Find the offset of the symbol in the output file. */
+        offset = _bfd_elf_section_offset(output_bfd, info,
+          input_section,
+          reloc->r_offset);
+        if (offset == (bfd_vma) -1)
+          return;
+        if (offset == (bfd_vma) -2)
+          return;
+        if (input_section->flags & SEC_CODE)
+          {
+          /* This should only occur for three symbols: __GLOBAL_OFFSET_TABLE__,
+            __ROFIXUP_LIST__ and __ROFIXUP_END__. */
+          pr_debug("Skipping relocation for text segment (vma %08lx).\n", offset);
+          return;
+        } /* end case SEC_CODE */
+        if (htab->sgot)
+        {
+          for (offset = 0; offset < htab->sgot->size; offset += 4) {
+            avr32_rofixup_add_entry(output_bfd, info, htab->sgot, offset);
+          }
+        }
+        + static void
+        + avr32_rofixup_terminate(bfd *output_bfd, struct bfd_link_info *info)
+        + {
+        struct elf_avr32_link_hash_table *htab;
+        bfd_vma rofixup_entry_offset;
+        htab = avr32_elf_hash_table(info);
+        BFD_ASSERT(htab->rofixup);
+        BFD_ASSERT(htab->rofixup->contents);
+        rofixup_entry_offset = htab->rofixup+4;
+        rofixup_entry_offset = htab->rofixup+4;
+        bfd_put32(output_bfd, 0xffffffff, htab->rofixup->contents + rofixup_entry_offset);
+        pr_debug("Added rofixup terminator.\n");
+        + htab->rofixup+4;}
+      + /* (6) Apply relocations to the normal (non-dynamic) sections */
```

static bfd_boolean

@@ -3435 ,6 +3622 ,9 @@ avr32 _ elf _ relocate _ section ( bfd *output _bfd , struct bfd _ link _ info *info ,
 break ;
@@ -3700 ,6 +3890 ,8 @@ avr32 _ elf _ finish _ dynamic _ sections ( bfd *output _bfd , struct bfd _ link _ info *info )
 if ( sgot )
 elf _ section _ data ( sgot ->output _section )->this _hdr .sh _ entsize = 4;
+ avr32 _ rofixup _ add _ got ( output _bfd , info );
+ avr32 _ rofixup _ terminate ( output _bfd , info );
 return TRUE ;
}

static bfd_boolean

+ avr32 _ fdpic _ always _ size _ sections ( bfd * output _bfd ,
+ struct bfd _ link _ info * info )
+
+ if ( ! info -> relocatable )
+ {
+ struct elf _ link _ hash _ entry *h ;
+ /* Force a PT_ GNU _ STACK segment to be created . */
+ if ( ! elf _ tdata ( output _bfd )->stack _flags )
+ elf _ tdata ( output _bfd )->stack _flags = PF_R | PF_W | PF_X;
+ /* Define __ stacksize if it 's not defined yet . */
+ h = elf _ link _ hash _ lookup ( elf _ hash _ table ( info ), "__ stacksize",
+ FALSE , FALSE , FALSE );
+ if ( ! h || h->root .type != bfd _ link _ hash _ defined
+ || h->type != STT _OBJECT
+ || ! h->def _regular )
+ {
+ struct bfd _ link _ hash _ entry *bh = NULL ;
+ if ( !(_ bfd _ generic _ link _ add _ one _ symbol
+ (info , output _bfd ,"__ stacksize",
+ BSP _GLOBAL , bfd _ abs _ section _ptr , DEFAULT _STACK _SIZE ,
+ (const char *) NULL , FALSE ,
+ get _elf _backend _data ( output _bfd )->collect , &bh)))
+ return FALSE ;
+ h = ( struct elf _link _hash _entry *) bh ;
+ h->def _regular = 1 ;
+ h->type = STT _OBJECT ;
+ }
+ }
+ return TRUE ;
+}
+ return TRUE ;
+}

static bfd_boolean

+ avr32 _ fdpic _ modify _ program _headers ( bfd * output _bfd ,
+ struct bfd _ link _ info * info )
+
+ if ( info )
+ {
+ struct elf _link _hash _ entry *h ;
+ /* First : FDPIC handling... */
+ avr32 _ rofixup _add _ relocation (output _bfd , info , input _section , rel );
+ /* We need to emit a run-time relocation in the following cases :
+ - we 're creating a shared library
+ - the symbol is not defined in any regular objects
+ if ( sgot )
+ elf _section _data ( sgot ->output _section )->this_hdr .sh _entsize = 4;

static bfd_boolean

+ avr32 _fdpic _ always _size _sections ( bfd * output _bfd ,
+ struct bfd _ link _ info * info )
+
+ if ( ! info -> relocatable )
+ {
+ struct elf _link _ hash _ entry *h ;
+ /* Force a PT_ GNU _STACK segment to be created . */
+ if ( ! elf _ tdata ( output _bfd )->stack _flags )
+ elf _ tdata ( output _bfd )->stack _flags = PF_R | PF_W | PF_X;
+ /* Define __stacksize if it 's not defined yet . */
+ h = elf _ link _ hash _ lookup ( elf _ hash _ table ( info ), "__stacksize",
+ FALSE , FALSE , FALSE );
+ if ( ! h || h->root .type != bfd _ link _ hash _ defined
+ || h->type != STT _OBJECT
+ || ! h->def _regular )
+ {
+ struct bfd _ link _ hash _ entry *bh = NULL ;
+ if ( !(_ bfd _ generic _ link _ add _ one _ symbol
+ (info , output _bfd ,"__stacksize",
+ BSP _GLOBAL , bfd _ abs _section _ptr , DEFAULT _STACK _SIZE ,
+ (const char *) NULL , FALSE ,
+ get _elf _backend _data ( output _bfd )->collect , &bh)))
+ return FALSE ;
+ h = ( struct elf _link _hash _entry *) bh ;
+ h->def _regular = 1 ;
+ h->type = STT _OBJECT ;
+ }
+ }
+ return TRUE ;
+}
+ return TRUE ;
+}

+ avr32 _ fdpic _ modify _program _headers ( bfd * output _bfd ,
+ struct bfd _ link _ info * info )
+
+ if ( info )
+ {
+ struct elf _link _ hash _ entry *h ;

F.3. GNU BINUTILS CHANGES 223
APPENDIX F. TOOLCHAIN PATCHES

+ /* Obtain the pointer to the __ stacksize symbol. */
+ h = elf_link_hash_lookup (elf_hash_table (info), "__ stacksize",
+ FALSE, FALSE, FALSE);
+ if (h)
+ { //while (h->root.type == bfd_link_hash_indirect
+ while (h->root.type == bfd_link_hash_indirect
+ || h->root.type == bfd_link_hash_warning)
+ h = (struct elf_link_hash_entry *) h->root.u.i.link;
+ BFD_ASSERT (h->root.type == bfd_link_hash_defined);
+ }
+ /* Set the header p_memsz from the symbol value. We
+ intentionally ignore the symbol section. */
+ if (h && h->root.type == bfd_link_hash_defined)
+ p->p_memsz = h->root.u.def.value;
+ else
+ p->p_memsz = DEFAULT_STACK_SIZE;
+ p->p_align = 8;
+ return TRUE;
+
+ static bfd_boolean
+ avr32_fdpic_copy_private_bfd_data (bfd *ibfd, bfd * obfd)
+ { //unsigned i;
+ if ( bfd_get_flavour (ibfd) != bfd_target_elf_flavour
+ || bfd_get_flavour (obfd) != bfd_target_elf_flavour )
+ return TRUE;
+ if (! avr32_elf_copy_private_bfd_data (ibfd, obfd) )
+ return FALSE;
+ if (! elf_tdata (ibfd) || ! elf_tdata (ibfd)->phdr
+ || ! elf_tdata (obfd) || ! elf_tdata (obfd)->phdr )
+ return TRUE;
+ /* Copy the stack size. */
+ for (i = 0; i < elf_elfheader (ibfd)->e_phnum ; i++)
+ if ( elf_tdata (ibfd)->phdr[i].p_type == PT_GNU_STACK )
+ { //Elf_Internal_Phdr *iphdr = &elf_tdata (ibfd)->phdr[i];
+ for (i = 0; i < elf_elfheader (obfd)->e_phnum ; i++)
+ if ( elf_tdata (obfd)->phdr[i].p_type == PT_GNU_STACK )
+ { //memcpy (&elf_tdata (obfd)->phdr[i], iphdr , sizeof (*iphdr));
+ /* Rewrite the phdrs, since we’re only called after they
+ were first written. */
+ if (bfd_seek (obfd, (bfd_signed_vma) get_elf_backend_data (obfd)
+ ->e->sizeof_ehdr, SEEK_SET) != 0
+ || get_elf_backend_data (obfd)->
+ ->write_out_phdr (obfd, elf_tdata (obfd)->phdr,
+ elf_elfheader (obfd)->e_phnum ) != 0)
+ return FALSE;
+ break;
+ }
+ break;
+ }
+ return TRUE;
+ }
+ #define ELF_ARCH bfd_arch_avr32
+ #define ELF_MACHINE_CODE EM_AVR32
+ #define ELF_MAXPAGESIZE 0x1000
+ #define ELF32Elf_grok_psinfo(bfd *abfd, Elf_Internal_Note *note)
+ #define elf_backend_get_header_size AVR32_GOT_HEADER_SIZE
+ #include "elf32-target.h"
+ /* FDPIC target */
+ #undef TARGET_BIG_SYM
+ #define TARGET_BIG_SYM bfd_elf32_avr32fdpic_vec
+ #undef TARGET_BIG_NAME
# define TARGET_BIG_NAME "elf32-avr32fdpic"
#endif
#endif elf32_bed
#define elf32_bed elf32_avr32fdpic_bed

#ifndef elf_backend_always_size_sections
#define elf_backend_always_size_sections
#endif
#endif elf_backend_modify_program_headers
#define elf_backend_modify_program_headers
#endif bfd_elf32_bfd_copy_private_bfd_data
#define bfd_elf32_bfd_copy_private_bfd_data
avr32_fdpic_copy_private_bfd_data
+
#include "elf32-target.h"

diff --git a/bfd/targets.c b/bfd/targets.c
index 975b9b4..70189ff 100644
--- a/bfd/targets.c
+++ b/bfd/targets.c
@@ -565,6 +565,7 @@ extern const bfd_target bfd_efi_app_x86_64_vec;
 extern const bfd_target bfd_efi_app_ia64_vec;
 extern const bfd_target bfd_elf32_avr_vec;
 extern const bfd_target bfd_elf32_avr32_vec;
-extern const bfd_target bfd_elf32_big_vec;
+extern const bfd_target bfd_elf32_avr32fdpic_vec;
 extern const bfd_target bfd_elf32_bfin_vec;
 extern const bfd_target bfd_elf32_bfinfdpic_vec;
 extern const bfd_target bfd_elf32_brinfdpic_vec;
 extern const bfd_target bfd_elf32_big_generic_vec;
@@ -886,6 +887,7 @@ static const bfd_target *const bfd_target_vector[] =
 \& bfd_elf32_avr_vec,
-\& bfd_elf32_avr32_vec,
+\& bfd_elf32_avr32fdpic_vec,
 \& bfd_elf32_bfin_vec,
 \& bfd_elf32_brinfdpic_vec,
 \& bfd_elf32_big_vec;

diff --git a/gas/config/tc-avr32.c b/gas/config/tc-avr32.c
index 2703ac2..4ff610f 100644
--- a/gas/config/tc-avr32.c
+++ b/gas/config/tc-avr32.c
@@ -49,6 +49,7 @@ int avr32_pic = FALSE;
 static int avr32_iarcompat = FALSE;
 static int avr32_fdpic = FALSE;
 /* This array holds the chars that always start a comment. */
-const char comment_chars[] = "#";
+const char *comment = "#";
#define OPTION_LINKRELAX (OPTION_NOPIC + 1)
#define OPTION_BOLINKRELAX (OPTION_LINKRELAX + 1)
#define OPTION_DIRECT_DATA_REFS (OPTION_BOLINKRELAX + 1)
#define OPTION_FDPIC (OPTION_DIRECT_DATA_REFS + 1)
+{"march", required_argument, NULL, OPTION_ARCH},
+{"part", required_argument, NULL, OPTION_PART},
+{"iar", no_argument, NULL, OPTION_IAR},
+{"mcpu", required_argument, NULL, OPTION_MCPU},
+{"fdpic", no_argument, NULL, OPTION_FDPIC},
+{NULL, no_argument, NULL, 0}
};
@@ -380,6 +383,9 @@ md_parse_option (int c, char *arg ATTRIBUTE_UNUSED)
 case OPTION_BOLINKRELAX:
     linkrelax = 0;
     break;
+ case OPTION_FDPIC:
+     avr32_fdpic = 1;
+     break;
     default:
         return 0;
@@ -3672,6 +3678,8 @@ md_begin (void)
 flags |= EF_AVR32_LINKRELAX;
 if (avr32_pic)
     flags |= EF_AVR32_PIC;
+ if (avr32_fdpic)
+     flags |= EF_AVR32_FDPIC;
 bfd_set_private_flags(stdoutput, flags);
226

574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637

638
639
640
641
642
643
644
645
646
647
648
649
650
651
652

APPENDIX F. TOOLCHAIN PATCHES

- -- a / include / elf / avr32 . h
+ ++ b / include / elf / avr32 . h
@@ -25 ,6 +25 ,7 @@
/* CPU - specific flags for the ELF header e _ flags field */
# define EF _ AVR32 _ LINKRELAX
0 x01
# define EF _ AVR32 _ PIC
0 x02
+ # define EF _ AVR32 _ FDPIC
0 x04
START _ RELOC _ NUMBERS ( elf _ avr32 _ reloc _ type )
RELOC _ NUMBER ( R _ AVR32 _ NONE ,
0)
diff -- git a / ld / Makefile . am b / ld / Makefile . am
index 58 c3f2c . . 3 b064a6 100644
- -- a / ld / Makefile . am
+ ++ b / ld / Makefile . am
@@ -165 ,6 +165 ,7 @@ ALL _ EMULATIONS = \
eavr32elf _ uc3b1256es . o \
eavr32elf _ uc3b1256 . o \
eavr32linux . o \
+
e a v r 3 2 l i n uxfdpic . o \
ecoff _ i860 . o \
ecoff _ sparc . o \
eelf32 _ spu . o \
@@ -757 ,6 +758 ,10 @@ eavr32linux . c : $ ( srcdir ) / emulparams / avr32linux . sh \
$ ( srcdir ) / emultempl / elf32 . em $ ( srcdir ) / emultempl / avr32elf . em \
$ ( srcdir ) / scripttempl / elf . sc $ { GEN _ DEPENDS }
$ { GENSCRIPTS } avr32linux " $ ( tdir _ avr32 ) "
+ eavr 3 2 l i n u x f d p i c . c : $ ( srcdir ) / emulparams / avr3 2 li n ux f dp i c . sh \
+ $ ( srcdir ) / emultempl / elf32 . em $ ( srcdir ) / emultempl / avr32elf . em \
+ $ ( srcdir ) / scripttempl / elf . sc $ { GEN _ DEPENDS }
+
$ { GENSCRIPTS } avr32linuxfdpic " $ ( tdir _ avr32 ) "
ecoff _ i860 . c : $ ( srcdir ) / emulparams / coff _ i860 . sh \
$ ( srcdir ) / emultempl / generic . em $ ( srcdir ) / scripttempl / i860coff . sc $ { GEN _ DEPENDS }
$ { GENSCRIPTS } coff _ i860 " $ ( tdir _ coff _ i860 ) "
diff -- git a / ld / Makefile . in b / ld / Makefile . in
index 1193 a74 . . 5 cacda9 100644
- -- a / ld / Makefile . in
+ ++ b / ld / Makefile . in
@@ -412 ,6 +412 ,7 @@ ALL _ EMULATIONS = \
eavr32elf _ uc3b1256es . o \
eavr32elf _ uc3b1256 . o \
eavr32linux . o \
+
e a v r 3 2 l i n uxfdpic . o \
ecoff _ i860 . o \
ecoff _ sparc . o \
eelf32 _ spu . o \
@@ -1583 ,6 +1584 ,10 @@ eavr32linux . c : $ ( srcdir ) / emulparams / avr32linux . sh \
$ ( srcdir ) / emultempl / elf32 . em $ ( srcdir ) / emultempl / avr32elf . em \
$ ( srcdir ) / scripttempl / elf . sc $ { GEN _ DEPENDS }
$ { GENSCRIPTS } avr32linux " $ ( tdir _ avr32 ) "
+ eavr 3 2 l i n u x f d p i c . c : $ ( srcdir ) / emulparams / avr3 2 li n ux f dp i c . sh \
+ $ ( srcdir ) / emultempl / elf32 . em $ ( srcdir ) / emultempl / avr32elf . em \
+ $ ( srcdir ) / scripttempl / elf . sc $ { GEN _ DEPENDS }
+
$ { GENSCRIPTS } avr32linuxfdpic " $ ( tdir _ avr32 ) "
ecoff _ i860 . c : $ ( srcdir ) / emulparams / coff _ i860 . sh \
$ ( srcdir ) / emultempl / generic . em $ ( srcdir ) / scripttempl / i860coff . sc $ { GEN _ DEPENDS }
$ { GENSCRIPTS } coff _ i860 " $ ( tdir _ coff _ i860 ) "
diff -- git a / ld / configure . tgt b / ld / configure . tgt
index c0c74f3 . . 2012162 100644
- -- a / ld / configure . tgt
+ ++ b / ld / configure . tgt
@@ -111 ,7 +111 ,9 @@ avr -* -*)
targ _ emul = avr2
;;
avr32 -* - none )
targ _ emul = avr32elf _ ap7000
targ _ extra _ emuls = " avr32elf _ ap7001 avr32elf _ ap7002 avr32elf _ ap7200 avr32elf _ uc3a0128
avr32elf _ uc3a0256 avr32elf _ uc3a0512 avr32elf _ uc3a0512es avr32elf _ uc3a1128 avr32elf _
uc3a1256 avr32elf _ uc3a1512es avr32elf _ uc3a1512 avr32elf _ uc3a364 avr32elf _ uc3a364s
avr32elf _ uc3a3128 avr32elf _ uc3a3128s avr32elf _ uc3a3256 avr32elf _ uc3a3256s avr32elf _
uc3b064 avr32elf _ uc3b0128 avr32elf _ uc3b0256es avr32elf _ uc3b0256 avr32elf _ uc3b164
avr32elf _ uc3b1128 avr32elf _ uc3b1256es avr32elf _ uc3b1256 " ; ;
- avr32 -* - linux *)
targ _ emul = avr32linux ; ;
+ avr32 -* - linux * | avr32 -* - uclinux *)
targ _ emul = avr32linux
+
targ _ extra _ emuls = " avr 32 l in u xf d pi c "
+
;;
bfin -* - elf )
targ _ emul = elf32bfin ;
targ _ extra _ emuls = " elf32bfinfd "
targ _ extra _ libpath = $targ _ extra _ emuls
diff -- git a / ld / emulparams / avr32linux . sh b / ld / emulparams / avr32linux . sh
index f281f9d . . fd36e7d 100644
- -- a / ld / emulparams / avr32linux . sh
+ ++ b / ld / emulparams / avr32linux . sh
@@ -4 ,6 +4 ,7 @@ TEMPLATE _ NAME = elf32
EXTRA _ EM _ FILE = avr32elf
OUTPUT _ FORMAT = " elf32 - avr32 "
GENERATE _ SHLIB _ SCRIPT = yes


---

**F.4. uClibc changes**

This patch enables uClibc to be linked statically into a FDPIC ELF binary on AVR32. It doesn't update the parts necessary for dynamic linking.

There are also a few simple changes to memcmp, memcpy and memmove, which makes them work on the UC3 (which cannot access unaligned memory.)

```
--- Rules.mk | 7 +++
extra/Config/Config.avr32 | 3 +
libc/string/avr32/memcmp.S | 11++++
libc/string/avr32/memmove.S | 16++++
libc/sysdeps/linux/avr32/Makefile.arch | 2-
libc/sysdeps/linux/avr32/ctrtl.S | 40++++++++++++++++------
libc/sysdeps/linux/avr32/crti.S | 4++
libc/sysdeps/linux/avr32/crtreloc.c | 85+++++++++++++++++++++++++++-
libc/sysdeps/linux/avr32/syscall.S | 6++
libc/sysdeps/linux/avr32/vfork.S | 4++
11 files changed, 191 insertions(+), 2 deletions(-)
```

---

```diff
+ diff --git a/ld/emulparams/avr32linuxfdpic.sh b/ld/emulparams/avr32linuxfdpic.sh
+++ a/ld/emulparams/avr32linuxfdpic.sh
@@ -0,0 +1,10 @@
+. ${srcdir}/emulparams/avr32linux.sh
+OUTPUT_FORMAT="elf32-avr32fdpic"
+OTHER_READONLY_SECTIONS="
+  rofixup : { *(.rofixup)
+  }"
```

---

```diff
+ diff --git a/extra/Config/Config.avr32 b/extra/Config/Config.avr32
+++ a/extra/Config/Config.avr32
@@ -399,8 +399,15 @@ endif
13 extra/Config/Config.avr32 | 3 +
14 libc/string/avr32/memcmp.S | 11++++
15 libc/string/avr32/memmove.S | 16++++
16 libc/sysdeps/linux/avr32/Makefile.arch | 2-
17 libc/sysdeps/linux/avr32/ctrtl.S | 40++++++++++++++++------
18 libc/sysdeps/linux/avr32/crti.S | 4++
19 libc/sysdeps/linux/avr32/crtreloc.c | 85+++++++++++++++++++++++++++-
20 libc/sysdeps/linux/avr32/syscall.S | 6++
21 libc/sysdeps/linux/avr32/vfork.S | 4++
21 11 files changed, 191 insertions(+), 2 deletions(-)
22 create mode 100644 libc/sysdeps/linux/avr32/crtreloc.c
```

---

```diff
+ diff --git a/Rules.mk b/Rules.mk
+++ a/Rules.mk
@@ -399,8 +399,15 @@ endif
13 Rules.mk | 7++++
14 extra/Config/Config.avr32 | 3+
15 libc/string/avr32/memcmp.S | 11++++
16 libc/string/avr32/memmove.S | 16++++
17 libc/sysdeps/linux/avr32/Makefile.arch | 2-
18 libc/sysdeps/linux/avr32/ctrtl.S | 40++++++++++++++++------
19 libc/sysdeps/linux/avr32/crti.S | 4++
20 libc/sysdeps/linux/avr32/crtreloc.c | 85+++++++++++++++++++++++++++-
21 libc/sysdeps/linux/avr32/syscall.S | 6++
22 libc/sysdeps/linux/avr32/vfork.S | 4++
21 11 files changed, 191 insertions(+), 2 deletions(-)
22 create mode 100644 libc/sysdeps/linux/avr32/crtreloc.c
```
```c
+ ##ifdef __CONFIG_AVR32_UC3__
+ /* This CPU cannot do unaligned accesses */
+ -1:
+ sub len, 1
+ brlt 2f
+ ld.ub r0, --src
+ st.b --dst, r0
+ rjmp 1b
+ 2:
+ +##else /* __CONFIG_AVR32_UC3__ */
+ -sub len, 4
+ brlt 2f
+ ld.w r0, --src
+ @@ -109,6 +124,7 @@ memmove:
+ 1: ld.ub r0, --src
+ r.b --dst, r0
+ .endr
+ +#endif /* __CONFIG_AVR32_UC3__ */
+ -popm r0-r7, pc
+ .size memmove, .-memmove
+ diff --git a/libc/sysdeps/linux/avr32/Makefile.arch b/libc/sysdeps/linux/avr32/Makefile.arch
+ index 44fc01e..0d906f8 100644
+ --- a/libc/sysdeps/linux/avr32/Makefile.arch
+ +++ b/libc/sysdeps/linux/avr32/Makefile.arch
+ @@ -5,7 +5,7 @@
+ -CSRC := brk.c clone.c mmap.c sigaction.c
+ +CSRC := brk.c clone.c mmap.c sigaction.c crtreloc.c
+ +CSRC := brk.c clone.c mmap.c sigaction.c crtreloc.c
+ +CSRC := brk.c clone.c mmap.c sigaction.c crtreloc.c
+ +CSRC := brk.c clone.c mmap.c sigaction.c crtreloc.c
+ +CSRC := brk.c clone.c mmap.c sigaction.c crtreloc.c
+ +CSRC := brk.c clone.c mmap.c sigaction.c crtreloc.c
+ +CSRC := brk.c clone.c mmap.c sigaction.c crtreloc.c
+ +#ifdef __PIC__
+ +#ifdef __AVR32_FDPIC__
+ +/* We need to save r10 & r11 until after relocation */
+ +mov r3, r10
+ +mov r4, r11
+ +#endif __AVR32_FDPIC__
+ +#endif __PIC__
+ +/* FDPIC handing */
+ -mov r12, r0
+ +#ifdef __PIC__
+ +#ifdef __AVR32_FDPIC__
+ +/* Find the rofixup address */
+ +1dpc r11, __original_rofixup
+ +1dpc r10, __original_rofixup
+ +#endif __AVR32_FDPIC__
+ +#endif __PIC__
+ +#ifdef __PIC__
+ +#ifdef __AVR32_FDPIC__
+ +/* Find the got */
+ +1dpc r10, __original_got
+ +1dpc r9, __original_got
+ +#endif __AVR32_FDPIC__
+ +#endif __PIC__
+ +#ifdef __PIC__
+ +#ifdef __AVR32_FDPIC__
+ +/* Relocate ROC fixes */
+ +rcall __rel_rofixup
+ +#endif __AVR32_FDPIC__
+ +#endif __PIC__
+ +/* Restore R10 & R11 */
+ +mov r10, r3
+ +mov r11, r4
+ +#ifdef __PIC__
+ +#ifdef __AVR32_FDPIC__
+ +/* Ok, now run uLibc's main() -- should not return */
+ +call __uLibc_main
+ +#endif __AVR32_FDPIC__
+ +#endif __PIC__
+ +mov r9, _init
+ +lda.w r9, __init
+ +mov r9, _fini
+ +lda.w r9, __fini
+ +lda.w r12, main
+ +lda.w r12, main
+ +#ifdef __PIC__
+ +#ifdef __AVR32_FDPIC__
+ +/* Ok, now run uLibc's main() -- should not return */
+ +call __uLibc_main
+ +#endif __AVR32_FDPIC__
+ +#endif __PIC__
+ +ld.w r0, _init
+ +ld.w r0, _fini
+ +ld.w r12, main
+ +ld.w r12, main
+ +#ifdef __PIC__
+ +#ifdef __AVR32_FDPIC__
+ +/* Ok, now run uLibc's main() -- should not return */
+ +call __uLibc_main
+ +#endif __AVR32_FDPIC__
+ +#endif __PIC__
+ +.align 2
+ +L__original_rofixup:
+ +.long __ROFIXUP_LIST__
+ +.long __GLOBAL_OFFSET_TABLE__
+# elif defined (__ PIC __)
+lddpc r6, .L_GOT
+.L_RGOT :
+rsub r6, pc
diff --git a/libc/sysdeps/linux/avr32/crti.S b/libc/sysdeps/linux/avr32/crti.S
index 660f47c..b39c4abf 100644
--- a/libc/sysdeps/linux/avr32/crti.S
+++ b/libc/sysdeps/linux/avr32/crti.S
@@ -5,12 +5,14 @@
+lddpc r6, 2f
+1: rsub r6, pc
+2: .align 2
+2: .long 1b - _GLOBAL_OFFSET_TABLE_
+3:
+#ifndef __ AVR32 _ FDPIC __
+2: .align 2
+3: .long 1b - _GLOBAL_OFFSET_TABLE_
+endif /* __ AVR32 _ FDPIC __ */
+endif /* __ AVR32 _ FDPIC __ */
+}
+section .fini
+align 2
@@ -18,9 +20,11 @@
+._fini, @function
+section .fini
+align 2
+endif /* __ AVR32 _ FDPIC __ */
+endif /* __ AVR32 _ FDPIC __ */
+}
+section .fini
+align 2
@@ -0,0 +1,85 @@
+include <sys/types.h>
+include <link.h>
+/* This data structure represents a PT_LOAD segment. */
+struct elf32_fdpic_loadseg {
+ /* Core address to which the segment is mapped. */
+ unsigned long addr;
+ /* VMA recorded in the program header. */
+ unsigned long p_vaddr;
+ /* Size of this segment in memory. */
+ unsigned long p_memsz;
+};
+struct elf32_fdpic_loadmap {
+ /* Protocol version number, must be zero. */
+ /* unsigned short version; */
+ /* Number of segments in this map. */
+ /* unsigned short nsegs; */
+ /* The actual memory map. */
+ struct elf32_fdpic_loadseg segs[/*nsegs*/];
+};
+static __always_inline void *
+__reloc_pointer (void *p,
+const struct elf32_fdpic_loadmap *map)
+{(...}
+int c;
+#if 0
+if (map->version != 0)
+/* Crash. */
+((void(*)()(0))0);
+#endif
+/* No special provision is made for NULL. We don't want NULL
+ addresses to go through relocation, so they shouldn't be in
+ .rofixup sections, and, if they're preseved in Vmaspace
+ relocations, they shall be mapped to the NULL address without
+ undergoing relocations. */
+for (c = 0;
+/* Take advantage of the fact that the loadmap is ordered by
virtual addresses. In general there will only be 2 entries,
so it’s not profitable to do a binary search. */

% virtual addresses. In general there will only be 2 entries,
so it’s not profitable to do a binary search. */

+ c < map->nsegs && p >= (void*)map->segs[c].p_vaddr;
+ c++)
+
+ = map->segs[c].p_vaddr;
+ /* We only check for one-past-the-end for the last segment,
+ assumed to be the data segment, because other cases are
+ ambiguous in the absence of padding between segments, and
+ rofixup already serves as padding between text and data.
+ Unfortunately, unless we special-case the last segment, we
+ fail to relocate the _end symbol. */
+ if (offset < map->segs[c].p_memsz
+ || (offset == map->segs[c].p_memsz && c + 1 == map->nsegs))
+ return (char*)map->segs[c].addr + offset;
+
+ return (void*) -1;
+
++ # ifdef __ PIC __
+ # ifndef __ AVR32_FDPIC __
+ lddpc r6, .Lgot
+ .align 2
+ # endif /* __ AVR32_FDPIC __ */
+ # else
+ # ifdef __ UCLIBC_HAS_THREADS__
+ rsub r6, r12, 0
+ __call r6[@_errno_location@got]
+ #endif /* __ UCLIBC_HAS_THREADS__ */
+ #endif /* __ PIC __ */
+ #endif /* __AVR32_FDPIC__ */
+ .Lgot: .long .Lgotcalc - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ #else
+ __asm__ __volatile__
+ .Lerrno_location: #endif /* __ UCLIBC_HAS_THREADS__ */
+ .label .Lerrno_location: #endif /* __ PIC__ */
+ .Lerrno_location:
+ .long .Lgotcalc - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ #else
+ __asm__ __volatile__
+ .L_errno_location: #endif /* __ PIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location: #endif /* __ PIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location: #endif /* __ PIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location: #endif /* __ PIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location: #endif /* __ PIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location: #endif /* __ PIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location: #endif /* __ PIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
+ .long .L_errno_location - _GLOBAL_OFFSET_TABLE_
+ #endif /* __AVR32_FDPIC__ */
+ .L_errno_location:
F.5 Unsubmitted GCC change

1 From: ec741a7b83a01e8f316f6f649cf984a0601f864a Mon Sep 17 00:00:00 2001
2 To: =?utf-8?q=Gunnar_Rang_C3_B4y?=rangoy@mnops.(none)
3 Subject: [PATCH] Set -mno-init-got if -mfdpic is specified.

This patch changes gcc so that specifying -mfdpic flag automatically adds the
-mno-init-got flag.

---

diff --git a/gcc/config/avr32/linux-elf.h b/gcc/config/avr32/linux-elf.h
index cb206a1..5dd7dbb 100644
--- a/gcc/config/avr32/linux-elf.h
+++ b/gcc/config/avr32/linux-elf.h
@@ -70,6 +70,7 @@
 # define DRIVER_SELF_SPECS "
 %mfdpic:p:%{fPIC:%{fPIE:\n  %fno-pie:p:%{fno-PIC:%{fno-PIE:-fpie}}}))))} \n  + %{mfdpic:-mno-init-got} \n "
---

1.5.4.3
Appendix G

Patch for elf2flt

This appendix lists the patch for the modifications done to the elf2flt utility while experimenting with the flat format. The patch is based on a CVS-snapshot (6. March 2009). These changes were not submitted to the maintainers, and probably never will be, since no useful results were achieved.

```diff
diff --git a/config.sub b/config.sub
index 4279c84..ed9cbb6 100755
--- a/config.sub
+++ b/config.sub
@@ -230,6 +230,7 @@ case $basic_machine in
 6 | alpha | alphaev[4-8] | alphaev56 | alphaev6[78] | alphapca5[67] \ 
 7 | alpha64 | alpha64ev[4-8] | alpha64ev56 | alpha64ev6[78] | alpha64pca5[67] \ 
 8 | am33_2.0 \ 
 9 +| avr32 \ 
10 | arc | arm | arm[bl]e | arme[1b] | arnv[2345] | arnv[345][1b] | avr \ 
11 | bin \ 
12 | c4x | clipper \ 
13 @@ -425,6 +426,10 @@ case $basic_machine in
14 | basic_machine=m68k-apple
15 | os=-aux
16 | ;
17 +| avr32)\ 
18 +| basic_machine=avr32
19 +| os=-linux
20 | ;;
21 +| balance)\ 
22 | basic_machine=ns32k-sequent
23 | os=-dynix
24 diff --git a/elf2flt.c b/elf2flt.c
index 546305f..9d97c39 100644
--- a/elf2flt.c
+++ b/elf2flt.c
@@ -64,6 +64,8 @@ #include <elf/microblaze.h> /* TARGET_* ELF support for the BFD library */
#define ARCH "nios2"
#endif
```
int load_to_ram = 0; /* instruct loader to allocate everything into RAM */
int ktrace = 0; /* instruct loader output kernel trace on load */

/* instruct loader to allocate everything into RAM */
int bad_relocs = 0;

asymbol ** symb;
long nsymb;

int i;

#ifndef TARGET_avr32_disable_
  flat_relocs = realloc(flat_relocs, (flat_reloc_count + got_size) * sizeof(uint32_t));
  for (i = 0; i < got_size / sizeof(uint32_t); i++) {
    uint32_t value = ntohl(((uint32_t *) data)[i]);
    fprintf(stderr, " Add GOT reloc at 0x%08x ( value : 0x%08x)\n", offset, value);
    flat_relocs[flat_reloc_count] = pflags | offset;
    flat_reloc_count++;
  }
#endif /* TARGET_avr32 */

fprintf(stderr, " casd : %lu\n", (unsigned long) flat_reloc_count);
for (a = abs_bfd->sections; (a != (asection *) NULL); a = a->next) {
  section_vma = bfd_section_vma(abs_bfd, a);
  for (a = abs_bfd->sections; (a != (asection *) NULL); a = a->next) {
/* Switching on : %d", q->howto->type */
  }
#endif /* TARGET_v850 */
case R_V850_H116_S:
  break;
#endif
  case R_V850_H16_S:
  break;
#endif
  case R_AVR32_32:
  print("relocating switch(AVR32_32), typenr: %d\n", q->howto->type);
  relocation_needed = 1;
  break;
  case R_AVR32_DIFF32:
  print("relocating switch(DIFF32), typenr: %d\n", q->howto->type);
  relocation_needed = 0;
  break;
  case R_AVR32_GOT16S:
  print("relocating switch(GOT), typenr: %d\n", q->howto->type);
  relocation_needed = 0;
  break;
  default:
  print("relocating switch(DEFAULT), typenr: %d\n", q->howto->type);
  goto bad_resolved_reloc;
#endif
  case R_AVR32_32:
  goto good_32bit_resolved_reloc;
#endif
  case R_AVR32_32:
  goto good_32bit_resolved_reloc;
```c
+ sym_vma = bfd_section_vma(abs_bfd, sym_section);
+ sym_addr += sym_vma + q->addend;
+ printf("real reloacting switch(AVR32_32), typenr: %d\n", q->
+ howto->type);
+ relocation_needed = 1;
+ break;
+ case R_AVR32_DIFF32:
+ printf("real reloacting switch(AVR32_DIFF32), typenr: %d\n", q->
+ howto->type);
+ relocation_needed = 1;
+ break;
+ case R_AVR32_GOTPC:
+ case R_AVR32_GOT16S:
+ printf("real reloacting switch(GOT), typenr: %d\n", q->howto->type);
+ break;
+ case R_AVR32_DIFF32:
+ printf("real reloacting switch(AVR32_DIFF32), typenr: %d\n", q->
+ howto->type);
+ relocation_needed = 1;
+ break;
+ case R_AVR32_GOTPC:
+ case R_AVR32_GOT16S:
+ printf("real reloacting switch(GOT), typenr: %d\n", q->howto->type);
+ break;
+ case R_AVR32_DIFF32:
+ printf("real reloacting switch(AVR32_DIFF32), typenr: %d\n", q->
+ howto->type);
+ relocation_needed = 1;
+ break;
+ case R_AVR32_GOTPC:
+ case R_AVR32_GOT16S:
+ printf("real reloacting switch(GOT), typenr: %d\n", q->howto->type);
+ break;
+ case R_AVR32_DIFF32:
+ printf("real reloacting switch(AVR32_DIFF32), typenr: %d\n", q->
+ howto->type);
+ relocation_needed = 1;
+ break;
+ case R_AVR32_GOTPC:
+ case R_AVR32_GOT16S:
+ printf("real reloacting switch(GOT), typenr: %d\n", q->howto->type);
+ break;
+ case R_AVR32_DIFF32:
+ printf("real reloacting switch(AVR32_DIFF32), typenr: %d\n", q->
+ howto->type);
+ relocation_needed = 1;
+ break;
+ case R_AVR32_GOTPC:
+ case R_AVR32_GOT16S:
+ printf("real reloacting switch(GOT), typenr: %d\n", q->howto->type);
+ break;
+ case R_AVR32_DIFF32:
+ printf("real reloacting switch(AVR32_DIFF32), typenr: %d\n", q->
+ howto->type);
+ relocation_needed = 1;
+ break;
+ case R_AVR32_GOTPC:
+ case R_AVR32_GOT16S:
+ printf("real reloacting switch(GOT), typenr: %d\n", q->howto->type);
+ break;
+ case R_AVR32_DIFF32:
+ printf("real reloacting switch(AVR32_DIFF32), typenr: %d\n", q->
+ howto->type);
+ relocation_needed = 1;
+ break;
+ case R_AVR32_GOTPC:
+ case R_AVR32_GOT16S:
+ printf("real reloacting switch(GOT), typenr: %d\n", q->howto->type);
+ break;
+ case R_AVR32_DIFF32:
+ printf("real reloacting switch(AVR32_DIFF32), typenr: %d\n", q->
+ howto->type);
+ relocation_needed = 1;
+ break;
+ case R_AVR32_GOTPC:
+ case R_AVR32_GOT16S:
+ printf("real reloacting switch(GOT), typenr: %d\n", q->howto->type);
+ break;
```

Appendix H

EVK1100 SRAM expansion board
Appendix I

Test source code

I.1 Linux exception tests

I.1.1 Unaligned read

```c
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>

static void sigbus_handler(int ignored)
{
    fprintf(stderr, "Got SIGBUS exception.\n");
    exit(1);
}

static char buffer[16];

int main()
{
    int *p = (int *)&buffer[1]; /* Create unaligned pointer. */
    signal(SIGBUS, sigbus_handler);
    fprintf(stderr, "Triggering SIGBUS exception (unaligned read):\n");
    printf("*p is: %d\n", *p);
    fprintf(stderr, "Exception didn't trigger.\n");
    return 0;
}
```

I.1.2 Unaligned write

```c
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>

static void sigbus_handler(int ignored)
{
    fprintf(stderr, "Got SIGBUS exception.\n");
    exit(1);
}

static char buffer[16];

int main()
```
I.1.3 Invalid read

```c
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>

static void sigbus_handler(int ignored)
{
    fprintf(stderr, "Got SIGBUS exception.\n");
    exit(1);
}

int main()
{
    int *p = (int *)(0x100000);
    signal(SIGBUS, sigbus_handler);
    fprintf(stderr, "Triggering SIGBUS exception (invalid read):\n");
    fprintf(stderr, "*p is: %d\n", *p);
    fprintf(stderr, "Exception didn't trigger.\n");
    return 0;
}
```

I.1.4 Invalid write

```c
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>

static void sigbus_handler(int ignored)
{
    fprintf(stderr, "Got SIGBUS exception.\n");
    exit(1);
}

int main()
{
    int *p = (int *)(0x100000);
    signal(SIGBUS, sigbus_handler);
    fprintf(stderr, "Triggering SIGBUS exception (invalid write):\n");
    *p = 42;
    fprintf(stderr, "Exception didn't trigger.\n");
    return 0;
}
```
I.2. TOOLCHAIN TESTS

I.1.5 Invalid opcode (aligned)

```c
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>

static void sigill_handler(int ignored)
{
    fprintf(stderr, "Got SIGILL exception.\n");
    exit(1);
}

int main()
{
    signal(SIGILL, sigill_handler);

    fprintf(stderr, "Triggering SIGILL exception (rsubeq instruction):\n");
    asm(".balignw 4, 0 xd703"); /* Align on 4 bytes, pad with NOPs. */
    asm("rsubeq r0, 42"); /* Illegal opcode. */
    fprintf(stderr, "Exception didn't trigger.\n");

    return 0;
}
```

I.1.6 Invalid opcode (unaligned)

```c
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>

static void sigill_handler(int ignored)
{
    fprintf(stderr, "Got SIGILL exception.\n");
    exit(1);
}

static void sigsegv_handler(int ignored)
{
    fprintf(stderr, "Got SIGSEGV exception.\n");
    exit(1);
}

int main()
{
    signal(SIGILL, sigill_handler);
    signal(SIGSEGV, sigsegv_handler);

    fprintf(stderr, "Triggering SIGILL exception (halfword aligned rsubeq instruction):\n");
    asm(".balignw 4, 0xd703"); /* Align on 4 bytes, pad with NOPs. */
    asm("nop"); /* Make sure that the illegal opcode is aligned at a half-word boundary. */
    asm("rsubeq r0, 42"); /* Illegal opcode. */
    fprintf(stderr, "Exception didn't trigger.\n");

    return 0;
}
```

I.2 Toolchain tests

I.2.1 Simple program
# include <unistd.h>

```c
int main (int argc, char * argv[])
{
    write(1, "Hello!\n", 7);
    return 0;
}
```

### I.2.2 More complex program

# include <stdio.h>

```c
int main (int argc, char * argv[])
{
    printf("Hello world! %d\n", 42);
    return 0;
}
```
Appendix J

Digital appendices

This appendix lists the digital appendices.

J.1 Linux patches
This is a directory with the patches for Linux

J.2 U-Boot patches
This is a directory with the patches we submitted for U-Boot.

J.3 U-Boot unsubmitted changes
This is a patch with the unsubmitted changes for U-Boot.

J.4 Toolchain patches
This directory contains the patches we submitted for GCC, GNU Binutils and uClibc.

J.5 elf2flt changes
This is a patch with the changes we made to elf2flt.

J.6 SPI DMA changes
This patch contains the changes we made to the SPI driver and the peripheral DMA controller.

J.7 Tests
This directory contains the source code for the tests.