[sysvabi64] Add chapter on Thread Local Storage#311
[sysvabi64] Add chapter on Thread Local Storage#311smithp35 wants to merge 13 commits intoARM-software:mainfrom
Conversation
sysvabi64/sysvabi64.rst
Outdated
| and ``PT_TLS`` as the program header with type PT_TLS. ``PAD`` must be | ||
| the smallest positive integer that satisfies the following congruence: | ||
|
|
||
| ``TP + TCB + PAD ≡ PT_TLS.p_vaddr (modulo PT_TLS.p_align)`` |
There was a problem hiding this comment.
TP+TCB+PAD on the left could be confusing, as TCB is placed before TP. Perhaps mention the requirement of TP first (= 0 (modulo p_align)), then describe PAD and this formula.
There was a problem hiding this comment.
I'll see if I can word it better. I've found it difficult to try and explain the formula intuitively.
| add xn, tp, :tprel_hi12:var, lsl #12 // R_AARCH64_TLSLE_ADD_TPREL_HI12 var | ||
| ldr xn, [xn, #:tprel_lo12_nc:var] // R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC var | ||
|
|
||
| Static link time TLS Relaxations |
There was a problem hiding this comment.
Perhaps call this Optimization to be consistent with x86/ppc and "Relocation optimization" (ADRP) and leave the term "relocation relaxation" for RISC-V style section shrinking.
There was a problem hiding this comment.
For TLS specifically I'd prefer to keep relaxation as that's what its been referred to in all the previous literature such as Drepper's ELF Handling for Thread Local Storage and the TLSDESC paper too. It should help people searching in the references.
I take the point that it ought to have been called optimization. I'll add a sentence to say that we're using relaxation as a term from the existing literature.
smithp35
left a comment
There was a problem hiding this comment.
Thanks very much for the review.
I've updated based on this and some comments I received internally.
| add xn, tp, :tprel_hi12:var, lsl #12 // R_AARCH64_TLSLE_ADD_TPREL_HI12 var | ||
| ldr xn, [xn, #:tprel_lo12_nc:var] // R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC var | ||
|
|
||
| Static link time TLS Relaxations |
There was a problem hiding this comment.
For TLS specifically I'd prefer to keep relaxation as that's what its been referred to in all the previous literature such as Drepper's ELF Handling for Thread Local Storage and the TLSDESC paper too. It should help people searching in the references.
I take the point that it ought to have been called optimization. I'll add a sentence to say that we're using relaxation as a term from the existing literature.
sysvabi64/sysvabi64.rst
Outdated
| and ``PT_TLS`` as the program header with type PT_TLS. ``PAD`` must be | ||
| the smallest positive integer that satisfies the following congruence: | ||
|
|
||
| ``TP + TCB + PAD ≡ PT_TLS.p_vaddr (modulo PT_TLS.p_align)`` |
There was a problem hiding this comment.
I'll see if I can word it better. I've found it difficult to try and explain the formula intuitively.
|
|
||
| AArch64 TLS SystemV design choices | ||
|
|
||
| * AArch64 uses variant 1 TLS as described in ELFTLS_. |
There was a problem hiding this comment.
Perhaps mention ELFTLS when doing the TLS introduction as a for more in-depth info resource.
There was a problem hiding this comment.
ACK. I'll mention that the introduction in the ABI is only sufficient to describe the terms used like Thread Control Block. A general introduction can be found in ELFTLS_
smithp35
left a comment
There was a problem hiding this comment.
Thanks very much for the comments. I'll hopefully have a new patch tomorrow.
|
|
||
| AArch64 TLS SystemV design choices | ||
|
|
||
| * AArch64 uses variant 1 TLS as described in ELFTLS_. |
There was a problem hiding this comment.
ACK. I'll mention that the introduction in the ABI is only sufficient to describe the terms used like Thread Control Block. A general introduction can be found in ELFTLS_
sysvabi64/sysvabi64.rst
Outdated
| knows that the TLS variable is defined in the same module as the code | ||
| that is accessing the variable. In this case the offset of the TLS | ||
| variable from the start of the module's TLS block is a static link | ||
| time constant. Instead of dynamically calculating the offset of the |
smithp35
left a comment
There was a problem hiding this comment.
Thanks for the review comments. I've made the following updates:
- Simple NFC text changes.
- Better description of deferred TLS and generation count.
- Reworded the padding size derivation.
- Moved paragraphs around to make the flow a bit easier.
Should be visible as 4 separate commits
sysvabi64/sysvabi64.rst
Outdated
| * (Most local) Automatic data (stack variables, instanced once per function | ||
| activation, per thread). | ||
|
|
||
| Rules governing thread local storage on AArch64 |
There was a problem hiding this comment.
This section probably should be named "Scope" and the part about thread_local and __thread should probably elsewhere.
There was a problem hiding this comment.
Will have a think to see where best to split out the source parts.
There was a problem hiding this comment.
I've decided to remove the source parts as they are out of scope, and everyone knows what they are anyway.
sysvabi64/sysvabi64.rst
Outdated
| return dtv[module_id][offset]; | ||
| } | ||
|
|
||
| The calculation in __tls_get_addr is the most general and it can be |
There was a problem hiding this comment.
nit:
s/__tls_get_addr/``__tls_get_addr``
sysvabi64/sysvabi64.rst
Outdated
| thread's DTV is updated, and the TLS for the ``module_id`` is | ||
| allocated if it is not present. | ||
|
|
||
| In pseudo code |
There was a problem hiding this comment.
I think this pseudo code listing is not necessary since we already have the description of the operation above. If both are retained, the code and the description may diverge over time.
There was a problem hiding this comment.
OK, I wanted to include the pseudo code in case the description wasn't good enough, but if it is then I can remove it. There are other sources to find pseudo code.
sysvabi64/sysvabi64.rst
Outdated
| } | ||
|
|
||
| The calculation in __tls_get_addr is the most general and it can be | ||
| applied to both static and dynamic TLS. There are four defined models |
There was a problem hiding this comment.
There are four defined models
This probably should start new paragraph and probably a new section
sysvabi64/sysvabi64.rst
Outdated
| The calculation in __tls_get_addr is the most general and it can be | ||
| applied to both static and dynamic TLS. There are four defined models | ||
| of accessing TLS that trade off generality for performance. In order | ||
| of descending generality: |
There was a problem hiding this comment.
descending generality
nit: maybe "descending" is better?
There was a problem hiding this comment.
Will have a think. Perhaps In descending order of generality:
sysvabi64/sysvabi64.rst
Outdated
| 4. Local Exec, can be used in the executable for TLS variables | ||
| defined in the executables static TLS block. | ||
|
|
||
| SystemV AArch64 TLS addressing |
There was a problem hiding this comment.
SystemV AArch64 TLS addressing
The title of this section is very similar to the previous one. I think it makes it a bit difficult to navigate this chapter.
There was a problem hiding this comment.
Will have a think to see if I can find a better one.
sysvabi64/sysvabi64.rst
Outdated
| only the descriptor dialect as this is the default dialect for GCC | ||
| and the only dialect supported by clang. | ||
|
|
||
| * The thread pointer (TP) is always accessible via the ``TPIDR_EL0`` |
There was a problem hiding this comment.
The thread pointer (TP)
nit:
s/TP/``TP``
as TP is used like inline code below.
sysvabi64/sysvabi64.rst
Outdated
| (``PADsize``) between the TCB and the executable's TLS Block. Using | ||
| ``TCBsize`` as the size of the TCB (16 bytes), the following expression can be used to calcluate ``PADsize`` from the ``PT_TLS`` program header. | ||
|
|
||
| ``PADsize = (PT_TLS.p_vaddr - TCBsize) mod PT_TLS.p_align``. |
There was a problem hiding this comment.
style: add a .. code-block:: instead
There was a problem hiding this comment.
Same for other formulae in this section. I think it will make reading easier.
sysvabi64/sysvabi64.rst
Outdated
| resolver function. | ||
|
|
||
| The static relocations with a prefix of ``R_AARCH64_TLSDESC_`` | ||
| targeting TLS symbol ``var``, instruct the static linker to create a |
There was a problem hiding this comment.
nit: superfluous comma after var?
sysvabi64/sysvabi64.rst
Outdated
|
|
||
| .. code-block:: asm | ||
|
|
||
| adrp xn, :gottprel: var // R_AARCH64_TLSIE_ADR_GOTTPREL_PAGE21 var |
There was a problem hiding this comment.
nit: superfluous space before var
yury-khrustalev
left a comment
There was a problem hiding this comment.
I've added a few minor comments. Otherwise this LGTM and should be approved and merged after the small fixes as per the comments. I think this chapter is very useful and provides good level of detail suitable for this ABI documentation.
smithp35
left a comment
There was a problem hiding this comment.
Thanks for the comments, I'll prepare a new patch, most likely going to be Wednesday.
sysvabi64/sysvabi64.rst
Outdated
| * (Most local) Automatic data (stack variables, instanced once per function | ||
| activation, per thread). | ||
|
|
||
| Rules governing thread local storage on AArch64 |
There was a problem hiding this comment.
Will have a think to see where best to split out the source parts.
sysvabi64/sysvabi64.rst
Outdated
| thread's DTV is updated, and the TLS for the ``module_id`` is | ||
| allocated if it is not present. | ||
|
|
||
| In pseudo code |
There was a problem hiding this comment.
OK, I wanted to include the pseudo code in case the description wasn't good enough, but if it is then I can remove it. There are other sources to find pseudo code.
sysvabi64/sysvabi64.rst
Outdated
| return dtv[module_id][offset]; | ||
| } | ||
|
|
||
| The calculation in __tls_get_addr is the most general and it can be |
sysvabi64/sysvabi64.rst
Outdated
| 4. Local Exec, can be used in the executable for TLS variables | ||
| defined in the executables static TLS block. | ||
|
|
||
| SystemV AArch64 TLS addressing |
There was a problem hiding this comment.
Will have a think to see if I can find a better one.
sysvabi64/sysvabi64.rst
Outdated
| only the descriptor dialect as this is the default dialect for GCC | ||
| and the only dialect supported by clang. | ||
|
|
||
| * The thread pointer (TP) is always accessible via the ``TPIDR_EL0`` |
sysvabi64/sysvabi64.rst
Outdated
| (``PADsize``) between the TCB and the executable's TLS Block. Using | ||
| ``TCBsize`` as the size of the TCB (16 bytes), the following expression can be used to calcluate ``PADsize`` from the ``PT_TLS`` program header. | ||
|
|
||
| ``PADsize = (PT_TLS.p_vaddr - TCBsize) mod PT_TLS.p_align``. |
sysvabi64/sysvabi64.rst
Outdated
| resolver function. | ||
|
|
||
| The static relocations with a prefix of ``R_AARCH64_TLSDESC_`` | ||
| targeting TLS symbol ``var``, instruct the static linker to create a |
sysvabi64/sysvabi64.rst
Outdated
|
|
||
| .. code-block:: asm | ||
|
|
||
| adrp xn, :gottprel: var // R_AARCH64_TLSIE_ADR_GOTTPREL_PAGE21 var |
The thread local storage chapter contains: * A description of Thread Local Storage based on addenda32 * The key design decisions of AArch64 TLS such as tls variant, tls dialect, TCB size. * The ABI required code sequence for TLSDESC that must be emitted exactly, as GNU ld requires it to be. * Sequences for the different code-models. * Relaxations for GD->IE, GD->LE and IE->LE. * Synchronization requirements for Lazy TLSDESC. With advice not to support it due to overhead of synchronization.
* Edits to split up the bullet points in How to denote TLS in source. * Changed program-own state to process-state as the thread-id may not be stored separately from the programs data. * Removed typically from some of the descriptions as the typically will almost always be the case for a sysvabi platform. * Linked alignment padding to the definition. * Provided a bit more information about generation counters.
* Rearranged formulas and used TCBsize to make it clearer. * Taken out "significant" from a significant number of dynamic linkers. * Give reason for using relaxation rather than optimization. * Clarify that there is no requirement to implement any TLSDESC resolver given in the sysvabi.
Change the input register in add xn, xn, :tprel_hi12:var, lsl ARM-software#12 to the thread pointer tp. We want to calculate the offset from the thread pointer so it needs to be an input of the add.
Document the decision in the GCC mailing list thread TLSDESC clobber ABI stability/futureproofness? https://gcc.gnu.org/legacy-ml/gcc/2018-10/msg00112.html TLSDESC resolver functions assume that any registers added by an extension are caller saved for a TLSDESC call. A brief summary: Dynamic TLS may be lazy allocated upon the first use of a TLSDESC resolver. This may involve calls to heap allocation functions provided by the user, which may use registers from extensions like SVE and SME. As the resolver function can't know what is saved it would have to save all SVE and SME state. This would be way more expensive than a caller save, and an older libc written prior to the introduction of the extension would be unaware of them so the caller has to do the save. * The SVE and SME state is already
Include a pseudo code description of __tls_get_addr with deferred TLS for dynamic modules.
Use integers modulo m to avoid excess use of (modulo m). Explain the congruence symbol. Put expression first so derivation is optional.
The TLSDESC resolver functions are not ABI so we can move them out of the sysvabi64 document. Providing some examples that can be used by a dynamic linker is still useful so move this to the design documents section. Add a comment about DTV surplus TLS that permits a dynamic loader to dlopen a DSO with initial-exec TLS. There can be a small number of performance critical shared-libraries that use initial exec TLS, but are expected to be opened via dlopen, particularly by scripting languages like python.
* Added `` `` to some variables. * Added some more section headings. * Used code-blocks for formula. * Fixed reference to design document.
|
Made changes to address review comments and rebase. |
Previous review comments changed name of a section.
The thread local storage chapter contains: