1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2 % Copyright (c) 2011, ETH Zurich.
5 % This file is distributed under the terms in the attached LICENSE file.
6 % If you do not find this file, copies can be found by writing to:
7 % ETH Zurich D-INFK, Universitaetstr. 6, CH-8092 Zurich. Attn: Systems Group.
8 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
10 \documentclass{scrreprt}
11 \usepackage[utf8]{inputenc}
12 \usepackage[T1]{fontenc}
13 \usepackage[pdftex]{graphicx}
14 \usepackage{times,helvet,url,color,listings,ctable}
15 \usepackage[pdftex]{hyperref}
16 \usepackage{cite} % must be after hyperref
18 % koma typearea: recalculate with new fonts
19 \typearea[current]{10}
22 \definecolor{boxshade}{gray}{0.85}
25 \definecolor{linkcol}{rgb}{0,0,0.7}
27 pdftitle={Barrelfish Specification},
31 linkcolor=linkcol,citecolor=linkcol,pagecolor=linkcol,urlcolor=linkcol
32 %breaklinks=true,pagebackref=true
35 % Default language for code listings is C
41 showstringspaces=false,
46 \lstdefinelanguage{Mackerel}{
47 morekeywords={datatype,device,register,regtype,constants,type,at,
51 morecomment=[s]{/*}{*/},
58 % autoref (from hyperref) setup
59 \def\chapterautorefname{Chapter}
60 \def\sectionautorefname{Section}
61 \def\subsectionautorefname{Section}
62 \expandafter\def\csname section*autorefname\endcsname{Section}
64 \newcommand{\note}[1]{ [\textcolor{red}{\emph{#1}}]}
65 \newcommand{\todo}[1]{\note{\textbf{TODO:} #1}}
67 \newcommand{\noarginvocation}[1]{\paragraph{#1 invocation}}
68 \newcounter{invocArgCnt}
69 \newenvironment{invocation}[1]{%
72 \begin{list}{\emph{Argument~\arabic{invocArgCnt}:}}{%
73 \usecounter{invocArgCnt}%
74 \setlength{\rightmargin}{\leftmargin}%
75 \setlength{\itemsep}{0ex}%
77 \renewcommand{\arg}{\item}
82 % environment to keep track of the stuff I've clagged from the seL4 spec
84 \newenvironment{sel4}{%
85 \begin{lrbox}{\@tempboxa}\begin{minipage}{\columnwidth}
87 \end{minipage}\end{lrbox}%
88 \noindent\colorbox{boxshade}{\usebox{\@tempboxa}}
91 \title{Barrelfish Specification}
92 \author{Andrew Baumann \and Simon Peter \and Timothy Roscoe \and
93 Adrian Sch\"upbach \and Akhilesh Singhania}
94 %% \date{\input{specdate}}
99 \chapter*{Acknowledgements}
101 Paul, Rebecca, Tim, et al.
109 \chapter{Introduction}
113 \chapter{Barrelfish Kernel API}
115 \section{System Calls}\label{sec:syscalls}
117 \subsection{Invoke}\label{sec:sys_invoke}
119 This system call takes at least one argument, which must the address of a
120 capability in the caller's CSpace. The remaining arguments, if any, are
121 interpreted based on the type of this first capability.
123 Other than yielding, all kernel operations including IDC are
124 provided by capability invocation, and make use of this call. The
125 possible invocations for every capability type are described in
126 the capability management document.
128 This system call may only be used while the caller is
129 \emph{enabled}. The reason is that the caller must be prepared to
130 receive a reply immediately and that is only possible when
131 enabled, as it requires the kernel to enter the dispatcher at the
134 \subsection{Yield}\label{sec:sys_yield}
136 This system call yields the CPU. It takes a single argument, which must be
137 either the CSpace address of a dispatcher capability, or \verb CPTR_NULL .
138 In the first case, the given dispatcher is run unconditionally; in the
139 latter case, the scheduler picks which dispatcher to run.
141 This system call may only be used while the caller is \emph{disabled}.
142 Furthermore, it clears the caller's \emph{disabled} flag, so the next time
143 it will be entered is at the run entry point.
145 \subsection{Debug system calls}
147 The following debug system calls may also be supported, depending on
148 build options, but are not part of the regular kernel interface.
150 \subsubsection{No-op}
151 This call takes no arguments, and returns directly to the
152 caller. It always succeeds.
154 \subsubsection{Print}
155 This call takes two arguments: an address in the caller's vspace, which
156 must be mapped, and a size, and prints the string found at that address
157 to the console. It may fail if any part of the string is not accessible
158 to the calling domain.
160 \subsubsection{Reboot}
161 This call unconditionally hard reboots the system.
162 \note{This call should be removed -AB}
164 \subsubsection{Debug}
167 \section{Dispatch and Execution}\label{sec:dispatch}
169 A dispatcher consists of code executing at user-level and a data
170 structure located in pinned memory, split into two regions. One
171 region is only accessible from the kernel, the other region is
172 shared read/write between user and kernel. The fields in the
173 kernel-defined part of the structure are described in
174 \autoref{tab:dispcb}.
177 caption=Dispatcher control structure,
180 ]{lll>{\raggedright}X}{}{
182 Field name & Size & Kernel R/W & Short description
184 \lstinline+disabled+ & word & R/W & If non-zero, the kernel will not
185 upcall the dispatcher, except to deliver a trap.
187 \lstinline+haswork+ & pointer & R & If non-zero, the kernel will
188 consider this dispatcher eligible to run.
190 \lstinline+crit_pc_low+ & pointer & R & Address of first instruction
191 in dispatcher's critical code section.
193 \lstinline+crit_pc_high+ & pointer & R & Address immediately after
194 last instruction of dispatcher's critical code section.
196 entry points & 4 function descriptors & R & Functions at which
197 the dispatcher code may be invoked
199 \lstinline+enabled_save_area+ & arch specific & W & Area for kernel
200 to save register state when enabled
202 \lstinline+disabled_save_area+ & arch specific & R/W & Area for
203 kernel to save and restore register state when disabled
205 \lstinline+trap_save_area+ & arch specific & W & Area for kernel to
206 save register state when a trap or a pagefault while disabled occurs
208 \lstinline+recv_cptr+ & capability pointer & R & Address of CNode to
209 store received capabilities of next local IDC into
211 \lstinline+recv_bits+ & word & R & Number of valid bits within
212 \lstinline+recv_cptr+
214 \lstinline+recv_slot+ & word & R & Slot within CNode to store
215 received capability of next local IDC into
219 Beyond these fields, the user may define and use their own data
220 structures (eg. a stack for the dispatcher code to execute on,
221 thread management structures, etc).
223 \subsection{Disabled}
225 A dispatcher is considered disabled by the kernel if either of the
226 following conditions is true:
229 \item its disabled word is non-zero
230 \item its program counter is within the range specified by the
231 \lstinline+crit_pc_low+ and \lstinline+crit_pc_high+ fields
234 The disabled state of a dispatcher controls where the kernel saves
235 its registers, and is described in the following subsection. When
236 the kernel resumes a dispatcher that was last running while
237 disabled, it restores its machine state and resumes execution at the
238 saved instruction, rather than upcalling it at an entry point.
240 \subsection{Register save areas}
242 The dispatcher structure contains enough space for three full copies
243 of the machine register state to be saved. The \lstinline+trap_save_area+
244 is used whenever the dispatcher takes a trap, regardless of whether
245 it is enabled or disabled. Otherwise, the \lstinline+disabled_save_area+
246 is used whenever the dispatcher is disabled (see above), and the
247 \lstinline+enabled_save_area+ is used in all other cases.
249 \autoref{fig:dispstatesaves} (Trap and PageFault states have
250 been left out for brevity) shows important dispatcher states and into
251 which register save area state is saved upon a state transition. The
252 starting state for a domain is ``notrunning'' and depicted with a
253 bold border in the Figure.
257 \includegraphics[width=\textwidth]{disp_states_simple_save_area_analysis}
258 \caption[Dispatcher state save areas]{Dispatcher state save areas.
259 Trap and PageFault states
260 omitted for brevity. Regular text and lines denote state changes
261 by the kernel. Dashed lines and italic text denote state changes
262 by user-space, which do not necessarily have to use the denoted
263 save area. The starting state is in the bold
264 node.}\label{fig:dispstatesaves}
267 Arrows from right to left involve saving state into the labeled
268 area. Arrows from left to right involve restoring state from the
269 labeled area. It can be seen that no state can be overwritten. The
270 kernel can recognize a disabled dispatcher by looking at the
271 disabled flag, as well as the domain's instruction pointer. Nothing
272 else needs to be examined.
274 The dispatcher states are also depicted in \autoref{fig:dispstates}.
276 \subsection{Dispatcher Entry Points}
278 Unless restoring it from a disabled context, the kernel always
279 enters a dispatcher at one of the following entry
280 points. Whenever the kernel invokes a dispatcher at any of its entry
281 points, it sets the disabled bit on. One (ABI-specific) register
282 always points to the dispatcher structure. The value of all other
283 registers depends on the entry point at which the dispatcher is
284 invoked, and is described below.
286 The entry points are:
289 \item[Run] A dispatcher is entered at this entry point when it was
290 not previously running, the last time it ran it was either enabled or
291 yielded the CPU, and the kernel has given it the CPU. Other than the
292 register that holds a pointer to the dispatcher itself, all other registers
293 are undefined. The dispatcher's last machine state is saved in the
294 \lstinline+enabled_save_area+.
296 \item[PageFault] A dispatcher is entered at this entry point when it
297 suffers a page fault while enabled. On entry, the dispatcher register is
298 set, and the argument registers contain information about the cause of
299 the fault. Volatile registers are saved in the
300 \lstinline+enabled_save_area+; all other registers contain the user
301 state at the time of the fault.
303 \item[PageFault\_Disabled] A dispatcher is entered at this entry point when it
304 suffers a page fault while disabled. On entry, the dispatcher register is
305 set, and the argument registers contain information about the cause of
306 the fault. Volatile registers are saved in the
307 \lstinline+trap_save_area+; all other registers contain the user
308 state at the time of the fault.
310 \item[Trap] A dispatcher is entered at this entry point when it is
311 running and it raises an exception (for example, illegal
312 instruction, divide by zero, breakpoint, etc.). Unlike the other
313 entry points, a dispatcher may be entered at its trap entry even
314 when it was running disabled. The machine state at the time of the
315 trap is saved in the \lstinline+trap_save_area+, and the argument
316 registers convey information about the cause of the trap.
318 \item[LRPC] A dispatcher is entered at this entry point when an
319 LRPC message (see below) is delivered to it. This can only
320 happen when it was not previously running, and was enabled. On
321 entry, four registers are delivered containing the message payload,
322 one stores the endpoint offset, and another contains the dispatcher pointer.
325 This diagram shows the states a \emph{dispatcher} can be in and how it
326 gets there. The exceptional states Trap and PageFault have been
331 \includegraphics[width=.7\columnwidth]{disp_states_simple}
332 \caption[Typical dispatcher states]{Typical dispatcher states.
333 Trap and PageFault states
334 omitted for brevity. Regular text and lines denote state changes
335 by the kernel. Dashed lines and italic text denote state changes
336 by user-space. The starting state is in bold.}
337 \label{fig:dispstates}
340 \section{Inter-Dispatcher Communication}\label{sec:IDC}
342 Inter-dispatcher communication (IDC) is a kernel-supported mechanism
343 to allow dispatchers to communicate by sending messages. IDC is
344 executed by invoking an IDC endpoint capability referring to a
345 receiving dispatcher.
347 \subsection{Endpoints}
349 IDC communication takes place between dispatchers via endpoints.
350 An endpoint is created by retyping a dispatcher capability
351 into an IDC endpoint capability. It refers to exactly one dispatcher, and
352 to one endpoint buffer structure within that dispatcher.
353 An endpoint buffer is a kernel-specified data structure located within the
354 dispatcher frame, where the kernel delivers IDC messages.
356 The kernel guarantees messages to either be delivered to the
357 receiving dispatcher or to return to the sender with an error status
358 code in the event that the receiver is unable to receive the message.
359 This implies that messages are never dropped silently by the kernel but does
360 not guarantee that messages are never dropped on the whole communication path,
361 which involves the receiving dispatcher.
363 It should be noted that endpoint capabilities may be freely copied, and do not
364 uniquely identify a sender. An endpoint capability can be
365 transferred to several dispatchers, all of whom may use the same
366 endpoint and thus the same buffer when sending messages.
368 \subsection{Message Transfer}
370 To send IDC, a dispatcher invokes an endpoint capability to the
371 receiving dispatcher. The message it wishes to send is provided as
372 argument to the invocation, as well as flags to specify additional
373 parameters influencing the message transfer.
375 An IDC message is delivered by the kernel allocating space in the receiver's
376 endpoint buffer, and writing the message contents. The receiver must poll its
377 endpoints to detect incoming messages, and consume them in order to free space
378 in the endpoint buffer for new messages. \todo{detail!}
380 A dispatcher that executes an IDC invocation is considered to have
381 yielded the CPU while enabled. Therefore, the next time it is
382 entered may be either at the \textbf{Run} or \textbf{LRPC} entry
385 \subsection{Capability transfer}
387 IDC can also be used to transfer capabilities from the sending dispatcher's
388 domain to a receiving dispatcher's domain.\todo{document cap transfer!}
390 \subsubsection{Flags}
392 If the ``sync'' flag is set and the message transfer succeeds, the kernel
393 will immediately dispatch the receiver. Effectively, the sender yields to the
396 If the ``yield'' flag is set, and the message fails for one of the following reasons:
398 \item the receiver's message buffer is full
399 \item the sender specified a capability, but it cannot be delivered because the
400 receiver's capability receive slot is non-empty
402 \ldots then the kernel will also immediately dispatch the receiver (without
403 performing a message transfer). Again, in this case the sender effectively yields
410 In this mode of IDC, the kernel performs a controlled
411 context switch from the sending to the receiving dispatcher,
412 preserving the capability invocation register state which is used to
413 deliver the message. The sender dispatcher is not blocked, however
414 it implicitly donates the remainder of its timeslice to the
417 If the receiving dispatcher is disabled, and the ``yield'' flag was set,
418 the kernel sets the return
419 register in the sending dispatcher's \lstinline+enabled_save_area+
420 to \lstinline+SYS_ERR_TARGET_DISABLED+. The kernel then switches to
421 and resumes the target dispatcher. In effect, an LRPC
422 operation when the target is disabled becomes a directed yield of
423 the CPU to the target dispatcher. If the ``yield'' flag was not set,
424 the kernel simply returns the same error code to the sender and runs the
427 \subsection{Interrupt delivery}\label{sec:interrupts}
429 Hardware interrupts are delivered by the kernel as asynchronous IDC
430 messages to a registered dispatcher. A dispatcher can be registered
431 as for a specific IRQ by invoking the IRQTable capability,
432 passing it an IDC endpoint to the dispatcher and the IRQ
433 number. It is not possible for multiple IDC endpoints to be
434 registered with the same IRQ number at any one time.
436 Henceforth, the kernel will send an IDC message using asynchronous
437 delivery to the registered endpoint. Asynchronous
438 IDC is used as it does not cause priority inversion by directly
439 dispatching the target dispatcher.
441 Refer to \autoref{apx:arch_specific} for more information about
442 valid hardware interrupts for an architecture-specific
443 implementation of Barrelfish.
445 \subsection{Exception delivery}
447 When a CPU exception happens in user-space, it is reflected to the
448 dispatcher on which it appeared. Page
449 faults are dispatched to the pagefault entry point of the
450 dispatcher. All other exceptions are dispatched to the trap entry
451 point of the dispatcher. The disabled flag of the dispatcher is
452 ignored in all cases and state is saved to the trap save area.
454 \section{Virtual Memory}
456 \todo{Our memory model is based on capabilities and is quite similar to seL4.}
458 \section{Initial Address Space}
460 Our address space initialization is similar to the one of seL4, but
461 we do not follow their boot protocol to the word. Here is our
464 We have a special program called \lstinline+init+ that is run by the kernel
465 after bootup as an ELF64 executable. In order to function, it has to
466 receive some information by the kernel. We show first how it
467 receives this information from its (\lstinline+init+'s) own perspective and
468 then how the kernel gathers and transmits this information to
471 \subsection{User-Space Perspective}
473 \lstinline+init+'s virtual address space size at startup is at most 4 MBytes
474 (the amount of pagetable kernel memory left for it), mapped as 4K
475 pages, starting from \lstinline+0x200000+ (2 Meg). \lstinline+init+'s text/data segments
476 should be aligned consecutively and start at \lstinline+0x400000+ (4 Meg),
477 leaving it 2 MBytes for its text and data.
479 We have a \lstinline+bootinfo+ structure, shown in \autoref{lst:bootinfo}.
481 \begin{lstlisting}[float,caption={\lstinline+bootinfo+ structure},
484 // Base address of small memory caps region
485 capaddr_t small_untyped_base;
486 // Number of small memory caps
487 size_t small_untyped_count;
488 // Base address of large memory caps region
489 capaddr_t large_untyped_base;
490 // Number of large memory caps
491 size_t large_untyped_count;
492 // Number of entries in regions array
493 size_t regions_length;
494 // Memory regions array
495 struct mem_region regions[MAX_MEM_REGIONS];
499 This structure is mapped into \lstinline+init+'s virtual memory at
500 address \lstinline+0x200000+ (2 Meg) and is at most a 4K page in
501 size. \lstinline+small_untyped_base+ points to the capability to the
502 CNode, holding a number (given by \lstinline+small_untyped_count+) of
503 small untyped capabilities. These can be used for easy setup of
504 init's own address space. \lstinline+large_untyped_base+ and
505 \lstinline+large_untyped_count+ is similar for (much) larger untyped
506 capabilities. Their sizes can be found in the \lstinline+regions+ array,
507 of size \lstinline+regions_length+ entries. An entry is defined by a
508 \lstinline+mem_region+ struct, shown in \autoref{lst:mem_region}.
510 \begin{lstlisting}[float,caption={\lstinline+mem_region+ structure},
511 label=lst:mem_region]
513 paddr_t base; // Address of the start of the region
514 size_t size; // Size of region in bytes
515 enum region_type type; // Type of region
516 uint64_t data; // Additional data, based on region type
520 Its fields should be self-explanatory. The possible region types are
521 defined by \lstinline+enum region_type+, shown in \autoref{lst:region_type}.
523 \begin{lstlisting}[float,caption={\lstinline+region_type+ enumeration},
524 label=lst:region_type]
526 RegionType_Empty, // Empty memory
527 RegionType_InitCaps, // init's caps mapped here
528 RegionType_RootTask, // Code/Data of init itself
529 RegionType_Device, // Memory-mapped device
530 RegionType_CapsOnly // Kernel-reserved memory
534 These are the same as those in seL4.
536 \subsubsection{Initial Capability Address Space}
538 \lstinline+init+'s initial CSpace is shown in \autoref{fig:init_cspace}.
542 \includegraphics[width=\textwidth]{init_cspace}
543 \caption{\texttt{init}'s initial capability space layout}
544 \label{fig:init_cspace}
547 \subsection{Kernel Perspective}
549 In the following, 'cn' will be short for CNode, \lstinline+init_dcb+ is short
550 for \lstinline+init+'s DCB, \lstinline+replyep+ is short for \lstinline+init+'s system call reply
551 endpoint. The kernel sets up \lstinline+init+'s domain as follows:
554 \item It allocates physical pages for: rootcn, taskcn, smallcn,
555 supercn, \lstinline+init_dcb+, replyep.
556 \item Map bootinfo, \lstinline+init_dcb+, replyep, rootcn, pml4, pdpt, pdir and
557 ptables in that order.
558 \item Allocate 64 physical pages and put untyped caps to them into
560 \item Map taskcn, smallcn, supercn, in that order.
561 \item Setup \lstinline+init+'s DCB.
562 \item Load \lstinline+init+ ELF64 binary into memory, map memory and allocate
564 \item Add all remaining memory as untyped caps to power of two large
565 regions into supercn. They may not be more than 64.
566 \item Fill \lstinline+bootinfo+ struct.
567 \item \lstinline+schedule()+ init.
572 Upon reception of a timer interrupt, the kernel calls `schedule()`,
573 which selects the next dispatcher to run. At the moment, a simple
574 round-robin scheduler is implemented that walks a circual
575 singly-linked list forever.
580 \item virtual machine support
582 \item resource management
583 \item thread migration
584 \item event tracing / performance monitoring
587 \chapter{Barrelfish Library API}
589 \todo{documentation of libbarrelfish}
591 \section{Capabilities}
593 \subsection{Data types}
598 get\_cap\_valid\_bits
600 get\_cnode\_valid\_bits
604 \subsection{Functions}
614 async\_endpoint\_create
615 local\_endpoint\_create
617 \subsection{Invocations}
620 \subsection{Syscalls}
628 \section{VSpace management}
639 vspace\_map\_attr\_raw ?
641 \section{Dispatch and threading}
643 \section{Spawning domains}
644 \subsection{Initial capability space}
646 The initial capability space of other domains is similar, but lacks the other
647 cnodes in the root cnode, as illustrated in \autoref{fig:app_cspace}.
651 \includegraphics[width=\textwidth]{app_cspace}
652 \caption{initial capability space layout of user tasks}
653 \label{fig:app_cspace}
661 \item[Capability] Every kernel object is represented by a capability,
662 allowing the user who holds that capability to manipulate it. We use
663 partitioned capabilities: capabilities are stored in memory accessible only
664 to the kernel, and are manipulated or invoked through the use of addresses
667 \item[CSpace] The capability address space, in which all capabilities reside,
668 is constructed and managed by user-space code through a hierarchy of page
669 table-like structures, called CNodes. The protection domain of user code
670 is determined by the capabilities existing in its CSpace.
672 \item[VSpace] The virtual address space
674 \item[Dispatcher] Kernel-scheduled entity, responsible for
675 scheduling/managing the execution of user code. Dispatchers are identified
676 by DCB capabilities. Every dispatcher has a CSpace and VSpace pointer,
677 which determine its protection domain and virtual address space. Multiple
678 dispatchers may share a CSpace or VSpace.
680 \item[Domain] Although not directly part of the Barrelfish kernel API, the
681 word domain is used to refer to the user-level code sharing a protection
682 domain and (usually) an address space. A domain consists of one or more
685 \item[DCB] Dispatcher control block, the kernel object associated with
686 a dispatcher, and therefore also one of the system's capability types.
687 \note{I'd prefer to avoid this term, as it can be confusing. -AB}
689 \item[IDC] Inter-dispatcher communication, the kernel-mediated message-passing
690 primitive. There are two types of IDC: the general case of
691 \emph{asynchronous IDC}, and an optimised \emph{local IDC} variant possible
692 only when both the sender and receiver execute on the same core.
694 \item[Endpoint] A type of capability that, when invoked, performs an IDC.
695 There are two endpoint types (asynchronous and local) to match the two
698 \item[Channel] A uni-directional kernel-mediated communication path
699 between dispatchers. All messages travel over channels. Holding a
700 capability for a channel guarantees the right to send a message to it
701 (although the message may not be sent for reasons other than
704 \item[Mapping Database] The mapping database is used to facilitate
705 retype and revoke operations.
707 A capability that is not of type dispatcher, can only be retyped once.
708 The mapping database facilitates this check.
710 When a capability is revoked, all its descendants and copies are deleted.
711 The mapping database keeps track of descendants and copies of a capability
712 allowing for proper execution of a revoke operation.
714 Each core has a single private mapping database.
715 All capabilities on the core must be included in the database.
717 \item[Descendant] A capability X is a descendant of a capability A if:
720 \item X was retyped from A,
721 \item or X is a descendant of A1 and A1 is a copy of A,
722 \item or X is a descendant of B and B is a descendant of A,
723 \item or X is a copy of X1 and X1 is a descendant of A.
726 \item[Ancestor] A capability A is an ancestory of a capability X
727 if X is a descendant of A.
730 \chapter{Implementation}
732 This chapter covers the implementation and algorithm of some subsystems.
734 \section{Mapping Database}
736 This section describes the mapping database is more detail.
737 It covers the algorithms including implementation details
738 and invariants on the database.
740 \subsection{Implementation details}
742 The database implements the following functions:
744 \item \emph{is copy} Checks if two capabilities are copies of each other.
745 Two capabilities are copies if they are of the same type
746 and they refer to same kernel object.
747 PhysAddr, RAM, Frame, DevFrame, CNode, Dispatcher, Kernel, EndPoint
748 Capability types explicitly reference kernel objects
749 so capabilities of such types can be tested simply.
750 We cannot handle other capability types yet,
751 comparing two VNode, we always return false
752 and comparing two IO or IRQTable, we always return true.
754 \item \emph{is ancestor} Checks if one capability is a parent of another.
755 In our current implementation, some capability types cannot have descendants
756 and some capability types cannot have ancestors.
757 For the rest, we check if the parent child relationship is possible
758 based on the retyping type allowed
759 and check if the kernel object the child refers is inclusive
760 in the range of kernel objects the parent refers to.
762 \item \emph{has descendants} Checks if a capability has any descendants
763 Walks the entire database checking if the capability has any descendants.
764 The function returns true when the first descendant is found
765 and if a capability other than a copy is found, it returns false.
767 \item \emph{has copies} Checks if a capability has any copies
768 Walks the entire database checking if the capability has any copies.
769 The function returns true when the first copy is found
770 and if a capability other than a descendant is found, it returns false.
772 \item \emph{insert after} Inserts a set of contiguous capabilities
773 after the given capability
775 \item \emph{insert before} Inserts a set of contiguous capabilities
776 before the given capability
778 \item \emph{set init mapping} Inserts a capability into the database
779 in the appropriate location.
780 If any copies or ancestors of the capability exist,
781 the capability is inserted after a copy or after the closest ancestor.
782 If no relatives exist in the database,
783 the capability is inserted at the top of the database.
785 \item \emph{remove mapping} Removes the capability from the database
788 \subsection{Invariants}
790 Some invariants on the database that must be true at all times.
792 \item The next and prev pointers on a capability are never NULL.
793 \item There is only one database per core.
794 Any capability on a core can be reach from another on the core.
795 \item Two separate databases do not share any capabilities.
796 The set of next and prev pointers on one database
797 is disjoint from the set of next and prev pointers on another.
798 \item Each capability on a core is on the database of the core.
799 A capability will eventually be visited by starting at any other capability
800 and walking the database.
801 \item The database is circular.
802 Walking in either direction from any capability,
803 the same capability will eventually be reached again.
804 \item The head of the database cannot have any ancestors.
807 \subsection{Current Limitations}
809 The database has the following limitations.
811 \item It does not handle VNode capabilities and other types.
812 Other types are not as crucial, but VNode will become a priority shortly.
814 \item No indexing for quickly inserting brand new capabilities.
815 The implementation starts at the head of the database
816 and traverses the entire database till an appropriate location is found.
818 \item Not necessarily a limitation but an important note.
819 The current implementation can report certain capabilities as descendants
820 when one could have been created by copying another
821 and report a descendant as a copy when it was created by retyping the ancestor.
822 This requires the implementation of some functions to test for relationships
823 in the correct order.
824 This may lead to some unforeseen issues later
827 \chapter{Architecture-Specific Features}\label{apx:arch_specific}
829 This chapter covers features specific to one implementation of
830 Barrelfish on a specific hardware architecture.
834 The x86-64 implementation of Barrelfish is specific to the AMD64
835 and Intel 64 architectures. This text will refer to features of
836 those architectures. Those and further features can be found in
837 \cite{intelsa} and \cite{amdsa} for the Intel 64 and AMD64
838 architectures, respectively.
842 The page table is constructed by copying VNode capabilities into VNodes to
843 link intermediate page tables, and minting Frame / DeviceFrame capabilities
844 into leaf VNodes to perform mappings.
846 When minting a frame capability to a
847 VNode, the frame must be at least as large as the smallest page size. The
848 type-specific parameters are:
851 \item \textbf{Access flags:}
852 The permissible set of flags is PTABLE\_GLOBAL\_PAGE
853 | PTABLE\_ATTR\_INDEX | PTABLE\_CACHE\_DISABLED |
854 PTABLE\_WRITE\_THROUGH. Access flags are set from frame capability
855 access flags. All other flags are not settable from user-space (like
856 PRESENT and SUPERVISOR).
858 \item \textbf{Number of base-page-sized pages to map:} If non-zero, this
859 parameter allows the caller to prevent the entire frame capability from
860 being mapped, by specifying the number of base-page-sized pages
861 of the region (starting from offset zero) to map.
864 \subsection{IO capabilities}
866 IO capabilities provide kernel-mediated access to the legacy IO space of
867 the processor. Each IO capability allows access only to a specific range of
870 The Mint invocation (see \autoref{sec:mint}) allows the permissible
871 port range to be reduced (with the lower limit in the first
872 type-specific parameter, and the upper limit in the second type-specific
875 At boot, an IO capability for the entire port space is passed to the
876 initial user domain. Aside from being copied or minted, IO capabilities may not
879 \subsection{Interrupts and Exceptions}
881 \subsubsection{Interrupts}
883 The lower 32 interrupts are reserved as CPU exceptions. Thus, there
884 are 224 hardware interrupts, ranging from IRQ number 32 to 255.
886 The kernel delivers an interrupt that is not an exception and not
887 the local APIC timer interrupt to user-space. The local APIC timer
888 interrupt is used by the kernel for preemptive scheduling and not
889 delivered to user-space.
891 \subsubsection{Exceptions}
893 The lower 32 interrupts are reserved as CPU exceptions. Except for a
894 double fault exception, which is always handled by the kernel
895 directly, an exception is forwarded to the dispatcher handling the
896 domain on the CPU on which it appeared.
898 Page faults (interrupt 14) are dispatched to the `pagefault` entry
899 point of the dispatcher. All other exceptions are dispatched to the
900 `trap` entry point of the dispatcher.
902 \bibliographystyle{plain}
903 \bibliography{defs,barrelfish}