Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Retrocomputing Stack Exchange is a question Anybody can ask a question

and answer site for vintage-computer hobbyists


interested in restoring, preserving, and using
the classic computer and gaming systems of Anybody can answer
yesteryear. It only takes a minute to sign up.

Sign up to join this community The best answers are voted up and
rise to the top

Retrocomputing

Why does the x86 not have an instruction to obtain its


instruction pointer?
Asked 1 year, 10 months ago Modified 1 month ago Viewed 6k times

This has always confused me. Why can you not directly obtain the IP, and instead have to go through
some odd assembly hoops such as calling a function whose only purpose is to push its own return
20 address onto the stack?

I'm asking about the historical reason, since this decision was probably made back in the time of the
8086.

instruction-set design-choices x86

Share Improve this question Follow edited May 21, 2022 at 20:45 asked May 20, 2022 at 13:24
user3840170 Michael Stachowsky
23.1k 4 91 150 3,151 2 22 31

Comments are not for extended discussion; this conversation has been moved to chat. – Chenmunka ♦ May 21,
2022 at 7:52

13 Worth noting is that x86-64 does have such an instruction: lea eax,[eip+0] – jpa May 21, 2022 at 9:12

3 BTW, there are plenty of other CPUs where you cannot do that directly, either. – dirkt May 21, 2022 at 11:33

4 The problem with asking why is that only the engineers that made the decisions really can answer. The rest of us
can make guesses: they could not at the time think of any use for it, it would take too many transistors or slow
down something, they might be planning for the future with more adress bits or modifiers, they knew of a good
workaround, the marketing people did not require it, programmers managed anyway, it might create other
problems somewhere, ... – ghellquist May 21, 2022 at 16:37

10 @ghellquist Many such engineers are still alive. Engineers can be and have been interviewed. Some engineers
are interested in the retro scene. Engineers must not bee seen as 100% opaque and inaccessible godlike figures
we can never know anything about. – hippietrail May 22, 2022 at 1:22

There'a a huge difference between engineers determining required addressing modes (incl. pc-relative) for e.g.
6809 and engineers creating 8086. The first made a wonderful job (even though their speculation on using
purchasable ROMs in designs has not justified), the latter have not properly anticipated the importance of pc-
relative addressing. – lvd May 22, 2022 at 8:26
To be honest I don't remember any 8bit CPU with a such instruction, including 8080, Z80, 6800, 6502 etc. Due to
the history of 8086 it's no surprise to such instruction is missing. I am not perfect sure, but I am afraid that 68000
has no such instruction too... Maybe there is another question on the table: Was there any CPU (8/16bit) with such
instruction? (And yes, it's sometimes useful, but not on daily basis.) – Martin Maly May 23, 2022 at 13:00

@MartinMaly: On the CDP1802, the program counter is just like any other address register, save only for the fact
that its number is stored in the P register. If one knows which register is being used as the program counter, its
value may be captured via GHI/GLO instructions. – supercat May 23, 2022 at 16:48

2 Hardware engineering POV: Why have an instruction (and expend all the resources which that implies: in design,
expending an instruction slot, testing in development, QA, production test, etc., all of which, overall, adds some,
marginal, increased cost to every chip) for something that A) can be done with existing instructions without too
much difficulty and which is something that is very rarely desired? [Yeah, I know it's CISC, but you've got to draw
the line somewhere.] – Makyen May 23, 2022 at 17:53

@supercat Yes, thank you, I wasn't sure about CDP1802 at the moment I wrote that comment. I have met CDP
only for a short time ten years ago... – Martin Maly May 24, 2022 at 5:31

8 Answers Sorted by: Highest score (default)

As Thorbjørn Ravn Andersen already put it nicely:

What would you need it for?


27
There is almost no practical (*1) need to obtain the PC address at runtime (*2) - it's a value to be
obtained during assembly time, provided by Assembler and/or Linker. A simple

HERE: LEA AX,HERE ;(*3)

will make sure the Assembler and/or Linker puts the actual instruction's address into a register (AX in
this case).

Now, if you really want to do the trick, than best do it like it would work with any CPU: Jump one
instruction ahead by using a subroutine call and then pop the return address.

CALL NEAR PTR NEXT ; Make sure it's not a far call *4
NEXT: POP AX

Except, there's a major caveat:

Above is trouble-free only in clean 16-bit code. Different addressing modes may require use of 32-bit
registers and more.

Going for tricks like this is a sure way to introduce incompatibilities. It's the old story of programming
what you want to do, not how to do it. Letting your tools, compiler/assembler/linker, do the 'dirty' work
ensures it gets performed the best possible way.

Further Reading:
As said, there is no real use for IP being one of the accessible registers; many quite successful
architectures (/360, 68k, 8080, 6500, etc) do not have a directly accessible PC. They have at most a PC
relative addressing (68k).
The PC is never a 'normal' register but is tied to the basic mechanics of a processor. In fact, it brings a
lot of advantages (*4) having it completely separated and not readable, as on 8086 and others. It offers
the most simple way to separate operational housekeeping (like (pre-) fetching) from logical operation.
The only use case for storing it is in the case of a subroutine branch. This can be handled keeping a
shadow copy of the next address to be executed.

Architectures that allow the use of a PC in addressing may need to hold a second shadow copy with the
instruction's address, complicating it. Architectures that include the PC in the register set, or use one of
their GP registers as PC, need to take care about several constraints (*5). One visible sign is that some
RISC implementations do need to use PC relative addressing with a constant offset from the actual
location. But there is more.

Long story short: Better not care for the PC at all - beside jumping that is.

*1 - The "almost" case is dynamically created code - but even then it would be more appropriate to
improve the generator.

*2 - And no, the standard use of BALR Rx,0 by /370 modules to get a local reference for jumps and
constants is an oddity due to there being neither an absolute nor a PC relative addressing, nor the ability
to load immediate word (address) values.

*3 - And yes, a MOV could be used as well in nearly all (simple) cases. I still prefer the LEA as it allows
even more weird address generations :))

*4 - It still may run into trouble depending on addressing modes and memory model.

*5 - A complex issue in CPUs with a certain level of asynchronous operation, like having prefetch or
speculative operation.

Share Improve this answer Follow edited Feb 13 at 12:23 answered May 20, 2022 at 14:21
Raffzahn
223k 22 632 920

Comments are not for extended discussion; this conversation has been moved to chat. – Chenmunka ♦ May 24,
2022 at 6:22

1 Practically every modern operating system requires such an instruction to implement shared libraries and position-
independent code, to the extent of CPU manufacturers heavily optimizing the call/pop instruction sequence, which
tended to be very slow as it confuses the return stack cache. Also, one of the major improvements of amd64 over
x86 was the addition of rip-relative addressing. – Remember Monica Jul 18, 2022 at 21:10

1 @RememberMonica Position Independent code does not need to access IP, it only needs IP relative (or entry
relative) addressing. Likewise do shared libraries only need it (if at all) during initialization, so again no need to
have a dedicated instruction. Adding one would be ISA bloat. – Raffzahn Jul 18, 2022 at 23:54

*5 doesn't seem to be used, neither now, nor through the edit history. – tevemadar Feb 13 at 11:18

@tevemadar Think I found where it belongs. Thanks – Raffzahn Feb 13 at 12:24

OP specifically clarifies interest in a historical reason. Intel would have to give their exact reasoning but
the following points are worth noting.
19
Intel's 8086 and 8088 were outgrowths of their earlier 8008 and 4004 microprocessors - these
architectures all had an address space that required more bits than their 16, 8 or 4-bit data width.
On the other hand, minicomputers of the day, including their microcomputer outgrowths, tended to have
an orthogonal approach to registers in which the Program Counter and Stack Pointer were numbered in
with the general purpose registers and could be accessed or used in a variety of addressing modes (cf.
PDP-11 vs LSI-11 and TI990 vs TMS9900 - both originally true 16-bit architectures addressing 2¹⁶
bytes).

The ability to access the Program Counter was very useful and was a common idiom in accessing
parameters and local variables stored with the code, and in implementing Position Independent Code or
in contexts relating to linking and/or shifting and/or overlaying code blocks (and there were also various
useful use-cases involving self-modifying code and dynamically generated/updated code).

One of the big issues with the PDP-11 and TI990 16-bit architectures, as well as earlier 8-bit and 16-bit
microprocessors generally, was the inability to index more than 2¹⁶ units of memory. The PDP-11 family
introduced models with separate Instruction and Data address spaces, while a more general approach
introduced segments allowed for separate code and stack segments and multiple data segments - with
its segment registers the 8086 was able to address a 2²⁰ physical memory. Segmentation also
introduced the ability to provide different Read/Write/Execute permissions for different segments.

This, and the increasing use of recursion (which requires a stack), made it inappropriate to store code
and data/parameters/locals together in adjacent addresses and means that addresses can't be
interpreted without their segment address. That put an end to much of the need for and utility of direct
access to the the Instruction Pointer.

Share Improve this answer Follow edited May 24, 2022 at 9:47 answered May 21, 2022 at 9:23
Toby Speight David M W Powers
1,611 14 31 299 1 4

1 This is a great answer to the question, with loads of supporting detail. Welcome to RCSE! – Mark Williams May 21,
2022 at 10:11

1 Somewhat misleading arguing. 68k managed to use up to 2^32 bytes without segmentation and also had pc-
relative addressing, including the ability to load address from PC to GPR. – lvd May 22, 2022 at 8:17

@lvd The 68000 is a couple of generations (a decade and a half) later and one of the first 32-bit microchips
(alhough with a 16-bit data bus much as the 8088 is a 16-bit machine like the 8086 but with an 8-bit data bus). So it
didn't need segmentation to address 2^32 bytes. Of course, paging is another approach to mapping logical address
space into a potentially larger physical address and is easier for compilers to manage (so segmentation is phased
out in x64). – David M W Powers May 22, 2022 at 11:30

2 68000 wad quite contemporary to 8086. – lvd May 22, 2022 at 12:55

@DavidMWPowers: In what sense is paging "easier for compilers to manage"? Having a logical address space
that's large enough to allow compilers to ignore paging is easy, but if one needs a larger address space
segmentation is easier to work with. The only systems I know of where compilers "manage paging" have very small
pages (256 bytes or less), and only allow one page to be active at a time. – supercat May 24, 2022 at 17:05

1 @supercat Segmentation means that that the same 'address' points to different locations and thus pointers are
more complex to manage; conversely they also introduce the possibility of two different addresses refering to the
same location (aliasing). Although in fact, the main reason for x64 dropping segmentation seems to have been that
OS's didn't use it and so the cost wasn't worth it. As a language developer I'd rather we still had segments
irrespective of the matching to a larger address space (and being able to manage paging helps manage
expandable arrays more efficiently). – David M W Powers May 25, 2022 at 12:49

@DavidMWPowers: Segmentation on 8086 means that pointers generally need to be treated as 32-bit quantities
rather than 16-bit quantities, except that operations like pointer addition only need to affect the bottom 16 bits. As
for the possibility that two different segment-offset combinations might access the same storage, that's a non-issue
for compilers since any particular region of storage will, within its lifetime, always be accessed using the same
segment:offset combination except in cases where it's interacting with outside hardware (and needs to be treated
as volatile anyway). – supercat May 25, 2022 at 14:46
@DavidMWPowers: The 8086 works out especially nicely, in fact, because it's possible to manage memory as a
linear collection of 65536 16-byte segments that may be individually available or free. If one honors a 4096-byte
allocation request by allocating a 16-byte header at 0x1234:0 and 4096 bytes from 0x1235:0 to 0x1235:0x0FFF,
the header at 0x1234:0 would need to record that the next available segment was at 0x1335:0 and intervening
segments were unavailable, but the memory manager wouldn't need to care that user code would access all of
those segments using 0x1235:0 to 0x1235:0x0FFF. – supercat May 25, 2022 at 15:48

The m68k is not a very good example when you want to boast its linear address space. Because PC-relative
addressing on the original 68000 has a 16-bit limit, that tends to create a "floating segment" of +/- 32k for purely
relocatable programs. The classic MacOS, for example, knows the notion of a "Code segment" (a separate code
resource outside the main segment) and quite some classic Mac development systems limited the size of
additional (to "main") code segments to 32k to be able to generate fully relocatable code without a loader (that
comforted the fact that early Macs were short on memory – tofro Feb 13 at 12:52

@tofro: The classic Macintosh had a lot of 32K limitations in situations where the 8088 would have had 64K
limitations. Applications on the Classic Mac used A5-relative addressing for what C would call "static duration"
objects. Incidentally, C compilers for the 68000, especially when using 16-bit int , could have really benefited
from a [[ ]] operator which was like [ ] , but with a non-scaled displacement measured in bytes, since
intPtr[i] would require converting i to a 32-bit int , and doing a 32-bit add with itself before using it as an
index, while a byte-indexed... – supercat Feb 13 at 16:43

...operation equivalent to *(int*)((char*)intPtr + i) could be done directly if intPtr and i were in


registers, and even if i were unscaled, code equivalent to *(int*)((char*)intPtr + (i+i)) would handle
cases where i is always -16384..+16383 faster than any possible code that doesn't explicitly use character-based
displacement. Sure one can use the latter ugly syntax to do the same thing, but it would be easier for a compiler to
handle [[ ]] operators than try to recognize code patterns using the latter syntax. – supercat Feb 13 at 16:46

So long as a stack exists, the IP address may easily be obtained via the byte sequence "E8 00 00 5B"
[ CALL $+3 : POP BX ] because near calls use PC-relative addressing. On the other hand, the normal
16 state of affairs for position-independent code on the 8086 is to be located at a fixed address within an
arbitrary segment, which allows zero-effort relocation on any 16-byte boundary.

Because such methods are available (in particular, the PC-relative addressing that enables this within a
code block), there is less need to provide a specific instruction to access the instruction pointer.

Share Improve this answer Follow edited May 25, 2022 at 6:19 answered May 20, 2022 at 15:07
Toby Speight supercat
1,611 14 31 36.1k 3 63 159

1 Potential catch-22 in some scenarios: with PIC, you may wish to have your stack set to a fixed offset from your
code. Which requires knowing the IP to calculate the load address to set SS appropriately... – TLW May 21, 2022 at
20:36

Just a note: NEVER do in this way in a real code! The proper way is to all a procedure that contains mov
(esp),reg:ret – lvd May 22, 2022 at 8:08

2 @lvd Never say Never. The procedureless way is quite proper, and the most efficient way of doing it in certain
usecases - e.g. code in *ROM where every byte counted (like on a computer with 2K RAM, 2K ROM and 2K
EPROM and no disk, as I had in the late 70s; mid 70s, my machines had <1K). – David M W Powers May 22, 2022
at 12:00

As it was already mentioned here, it would cause HW callstack to mispredict ALL returns from the first occurence of
such code and up to the eviction of all wrong callstack entries. So it's a strict no-no for any i386 code since the
times of pentium-2 (I might be wrong here, but only a little). – lvd May 22, 2022 at 12:49

BTW I've just googled that x86 macos did exactly the same:
reverseengineering.stackexchange.com/questions/1654/… which adds up to the long story of idiotic technical
decisions in macos (the first being artificial segmentation in early 68000 macoses) – lvd May 22, 2022 at 12:50
@lvd: Would performance have been better if code did something like stc/call/jnc done/mov eax,
[esp]/clc/ret/done: ? As for "artificial" segmentation, I don't see what the objection is, given that it allowed the
use of 16-bit branch offsets within segments, and the use of the "A5 world" allowed use of 16-bit data
displacements for a program's static data. Sure it made it necessary to sometimes replace what would have been a
large static-duration array with a heap-allocated object, but that was a small price to play in exchange for the
improved efficiency one would receive in exchange. – supercat May 22, 2022 at 16:21

1 @supercat: what's the efficiency you're talking about? Not a single dynamic linked list could be made on early 68k
macs without double dereferencing addresses through "master pointers" -- provided you've got enough master
pointers, which is not always the case! Every time one works with already dereferenced pointers and does function
calls, there's a hazard that function call might result in heap 'defragmentation'. All that stuff made both 68k macos
programming more cumbersome and the resulting programs slower -- is that an improved efficiency indeed? – lvd
May 23, 2022 at 7:06

@supercat I didn't get the first part of your comment. What is all that stuff starting from stc ? If one needs PC
value in i386 code, the right way to get that is to call a procedure containing exactly this: mov reg,[esp]: ret .
– lvd May 23, 2022 at 7:12

@lvd: I was thinking that latter approach would require having a routine someplace, but that could easily be worked
around by having a relative short branch ahead 3 followed by a RETN instruction (skipped by the branch) and a
relative call back 1 (hitting the RETN). – supercat May 23, 2022 at 14:54

2 This answer makes no attempt to address the question "why is there no direct way to access the instruction
pointer?". Whilst it might be interesting to show an indirect way to do it, that is an answer to an entirely different
question. – Toby Speight May 24, 2022 at 9:49

@TobySpeight: The question of whether to support something via "direct" means is usually coupled to the question
of whether it can be practically accomplished via other means. – supercat May 24, 2022 at 13:43

1 @TobySpeight: On the Z80, the only call instructions use a fixed target address (either following the opcode or
embedded within it), which means one needs to know of a fixed address where one can find or place code to
inspect the stack, but on the 8086 one can place the stack-inspection code in the same blob as the call instruction
without having to know where it is. The ability to do that means that direct instruction to access the PC has far less
value than it would on e.g. the Z80. – supercat May 24, 2022 at 14:55

On 8086 the instruction pointer is not a general purpose register you can freely access for reading. On
earlier 808x models this was also the case, even though program counter was directly used to fetch
14 instructions without a prefetch queue, and it was settable via PCHL instruction. Because the CPUs
supported natively stack, jumping, and subroutines, the programming model how to write ordinary
programs just did not need reading of IP, so opcodes and their parameters could be used for other more
useful things. And it is still possible to read the position where the CPU is currently executing indirectly if
there is need. At a quick glance, many other CPUs from approximately same era (Z80, 6500, 6800) also
don't have an opcode to read PC/IP, likely from the same reason.

The CPU does not directly use the instruction pointer for execution, as the executed instructions are
fetched from prefetch queue, and the queue is filled from memory.

The instruction pointer (IP) does not reside on the Execution Unit (EU) side, but on the Bus Interface
Unit (BIU) side, with the segment registers.

So just like there is no instruction to directly set/store IP, because a jump or call must clear prefetch
queue to make sure instructions are fetched for execution from correct address, there is also no
instruction to get/load IP, because it likely won't point to the currently executed instruction.

So, whenever the actual value of IP is needed, such as when CALLing a subroutine to push correct
value of IP to stack, the value is adjusted as needed and then stored to memory. So there is some logic
to keep track of how the IP should be modified when the value is needed.
But internally, that's how the CPU works according to the user manual, with the IP pointing to the
address of memory to be next fetched into queue.

Share Improve this answer Follow edited May 21, 2022 at 10:57 answered May 20, 2022 at 14:47
Justme
31.5k 1 73 145

1 "[the IP] likely won't point to the currently executed instruction" - The IP associated with the currently executed
instruction must exist somewhere, else how could a relative branch work? – Wayne Conrad May 20, 2022 at 17:00

4 In order for the CPU to be able to correctly process a CALL or INT instruction, it must know the address of the
following instruction, even if instructions after that have been fetched and will need to be discarded. By my
understanding, the 8086 maintains a Program Counter within the bus interface, and an Instruction Pointer within
the execution unit. CALL and INT instructions save the Instruction Pointer; control-flow instructions load both the
PC and IP with the new address. – supercat May 20, 2022 at 17:01

Of course there needs to be some logic how it works. However, the 8086 user manual says the instruction pointer
is in the BIU, and it also says the IP points to the next instruction to be fetched by the BIU (emphasis from manual).
IP is adjusted to point to next instruction to be executed if IP needs to be saved on stack. – Justme May 20, 2022 at
17:27

Forgot to mention IP will be adjusted by jumps too. And the prefetch queue is the reason why self-modifying code
must ensure that the modified code is not already fetched into queue before modifying the memory. And why "POP
CS" did exist but was made undocumented. – Justme May 20, 2022 at 17:43

@Justme: That seems a bit weird, since it would require that the BIU either have a full 16-bit ALU to perform
relative jumps, or would require that the BIU adjust the PC, transfer it to the main execution unit, have it compute
the new address, and send the PC back to the BIU. Also, what I remember from back in the day is that the reason
the IP was called the IP rather than the PC was to distinguish the fact that the PC would often point a few bytes
ahead of the IP. – supercat May 21, 2022 at 17:35

@supercat The BIU already has to have some kind of adder to generate physical address from segment and offset.
Or it may use the EU ALU which is anyway used for EA calculation. Or something different. Based on 8086
manuals, we can't know how exactly it works, it just says the IP is adjusted accordingly. The prefetch queue keeps
track how full it is and thus can be used to know how much to adjust. And 8086 has IP, it is not called PC anywhere
in docs. – Justme May 21, 2022 at 18:35

@Justme: Segment+offset calculations need a 12-bit full ALU plus a 4-bit half adder (one operand plus carry from
lower half). Does the 8086 use a separate ALU for effective address calculations as for other register operations? I
guess it could but that seems surprising. – supercat May 22, 2022 at 0:24

There's a highly-upvoted comment, also requoted in this question's accepted answer, saying What
would you need it for? For what it's worth, here is a real, practical, albeit reasonably obscure example.
11
Once upon a time I wrote a C interpreter. One of its goals was to allow interoperation between
interpreted and previously compiled code. It contained its own dynamic linker, so that it could read in
object and library files, and call functions in them. (This is the same sort of thing that dlopen does
today.)

Besides interpreted code making calls to compiled functions, it was also possible for compiled code to
call back to interpreted code. Without going into all the gory details, this meant that a pointer to the
interpreter's data structure describing an interpreted function had to also be usable as an actual function
pointer. This data structure therefore began (that is, at offset 0) with a little data block containing
trampoline code which was contrived to fire up the interpreter on the just-called function. And the very
first thing the trampoline code had to do was, naturally enough, fetch its own PC, because that value
was actually the pointer to the data structure describing the function to be interpreted. (Needless to say
this was long before execute protection on data pages, and stuff like that.)
The first processor I wrote this for was the PDP-11, where I found myself able to do something
straightforward like mov pc,r0 . Not long after, I ported it to the VAX, which trapped on me when I tried to
access my own PC in such a crude and obvious way, but after some digging around I was able to
achieve the desired effect via a mild circumlocution such as lea 0@pc or something like that. I was
certainly aware that what I was doing was dicey and difficult: the PC is not a general-purpose register,
for all the sorts of reasons described in the other answers here. So I wasn't surprised I needed a
circumlocution, and I was pleased to find one that worked.

(Later on I managed to port the interpreter, with some new trampoline code, to an 80286 or 80386 under
MS-DOS, but I don't remember which instructions I used or how clever I had to be. Much later I
somehow got it working on the same x86-based Mac where I'm typing this today. The gory, handcrafted
trampoline code is gone, replaced by a magic libffi closure.)

So, anyway, this is an example of why someone might legitimately need — or once have needed — to
directly access the program counter.

Share Improve this answer Follow edited May 26, 2022 at 12:16 answered May 21, 2022 at 6:51
Omar and Lorraine Steve Summit
38.9k 14 134 275 297 1 5

1 Well, nice to know why it would be useful to have such an instruction, but it still does not answer the question why
there is no x86 opcode for directly getting the current value of IP, and even without such an instruction, it is easy to
indirectly fetch it. Regardless of the CPU architecture, you have to work with the registers and instructions you
happen to have. – Justme May 21, 2022 at 7:56

2 [cont] If (as is often the case: no reflection on you or the original OP), it's an "X-Y" problem (an OP wants to do
"something", and thinks knowing the PC will help), then people may be able to point the OP in better ways of
solving the core problem. In your particular case, depending when you asked, you might have been told of LEA
0@PC , have been directed to libffi , or – indeed – your genuine need might have sparked the development of
something like libffi in the first place! – TripeHound May 21, 2022 at 8:06

On a side note, I used and use (40+ yrs) Assm as main tool on a plethora of CPUs (pick any), for anything from
bare metal to OS development and system routines to accounting applications (Jup, those were the days when
adding VAT was done in assembly :)) and never found a generic reason to copy the PC, unless there's a specific
CPU related issue - like not having immediate values past a single byte on a /360. If there's a need to do so, then
either tools (Assembler/Linker) or more advanced instructions (LEA) will be the way to go. – Raffzahn May 21,
2022 at 9:10

2 Second, I understand why you are proud of your clever hack even though I do not fully understand which original
problem you solved (avoiding having to run the compiler?). I noted, however, that you found the required
handcrafted (probably meaning raw assembly) trampoline code difficult to maintain and replaced it.
– Thorbjørn Ravn Andersen May 21, 2022 at 10:51

2 Third, "fell into disuse" sounds like it wasn't used by many others than you. How come?
– Thorbjørn Ravn Andersen May 21, 2022 at 10:54

3 Fourth, I agree with Raffzahn that this does not answer the original question at all (which may cause it to be closed
and deleted) and therefore is probably better placed somewhere else. Perhaps on your personal blog with a lot
more details? – Thorbjørn Ravn Andersen May 21, 2022 at 11:16

I don't understand why the data structure needed to follow the trampoline code rather than the code which invoked
the trampoline, whose address would be pushed in the stack when the trampoline was called. – supercat May 21,
2022 at 17:36

1 @supercat Not sure I understand the question, and this isn't the place for a long discussion of that pet project,
which I've said too much about here already, but what I glossed over is that the interpreter has one global symbol
table, containing addresses which can be (a) addresses of data objects, (b) pointers to interpreted-function objects,
or (c) compiled functions. If I say signal(SIGINT, f) , where f is an interpreted function, the kernel gets a
pointer to the interp-function object, which it blindly uses a convention call instruction on, but that works, because of
the trampoline. – Steve Summit May 21, 2022 at 17:50
@ThorbjørnRavnAndersen It was a toy I wrote for fun, to see if I could. But an interpreter can be very useful. The
description at the top of its man page was going to be "Eliminate files x.c from /tmp". Not sure what a particular
printf format does? Just fire up the interpreter and try it. But there were practical uses as well. For example, my
answer to unix.stackexchange.com/questions/703141 (though it's hardly Posix-compliant) would be ci -c
'time(0)' . And for years I think ~/bin/lseek was a shell script that invoked ci to invoke lseek , not the
small C program it is today. – Steve Summit May 22, 2022 at 11:15

@ThorbjørnRavnAndersen It fell into disuse because that's the fate of many back-burner projects. Also it was the
most rampantly nonportable program I've ever written, and the only one with significant (let alone numerous)
scraps of assembler to do things that simply can't be done in C. But it was just way too much work to port it to new
platforms. Eventually, though, libffi came along, which does everything it needs. :-) And it's occasionally handy for
answering C questions on Stack Overflow, because again, I can just type things in, no need to create and compile
x.c . – Steve Summit May 22, 2022 at 11:22

@SteveSummit Thank you. I was really interested to see if this technique had been absolutely necessary in
production and if so, why. I am fine with this being a toy project you've played with over the years, especially as you
decided the approach was unmaintainable and ported it successfully to another technology. That is not something
everyone gets to do! That said, your description reminded me of bellard.org/tcc which is a tiny, fast C-compiler
compiling to memory, almost giving an interpreter. You might find it interesting. – Thorbjørn Ravn Andersen May 23,
2022 at 9:18

For future readers, this answer have been heavily edited, and many of the comments above (plus some that were
deleted) were written in response to an earlier version of the answer. – Thorbjørn Ravn Andersen Jun 21, 2022 at
21:41

8086 uses a segmented memory model with the address instructions are fetched from calculated by CS
* 16 + IP . For position independent code one would fix IP at link time and choose CS at runtime.
2
It was not until the 386 and operating systems that used a 32-bit flat memory model that reading IP
was useful for position independent code.

Share Improve this answer Follow answered Feb 17 at 23:13


Timothy Baldwin
129 1

A readable PC may require extra hardware resources, perhaps a dedicated set of unidirectional wires
from the PC to some write Dest reg data path. Fetching instructions and pushing the PC only requires
0 wires from the PC to the memory access unit.

So why add wires to the chip area budget unless there’s profitable payback (performance, etc.)

Share Improve this answer Follow answered May 28, 2022 at 2:03
hotpaw2
8,183 1 19 46

The "wires" to put the PC register on the ALU bus already exist even on the 8086, see e.g. the patent. – dirkt Feb
18 at 7:34

It does. call 1f; 1: is "push ip".

-1 Share Improve this answer Follow answered May 22, 2022 at 15:23
R.. GitHub STOP
HELPING ICE
581 3 14
4 This was already mentioned in supercat’s answer. – Stephen Kitt May 22, 2022 at 16:58

1 @wizzwizz4: Calling to a label immediately after the call instruction does not change the flow of execution but
pushes the address immediately after the call instruction (the new value of ip) onto the stack (as the "return
address"). – R.. GitHub STOP HELPING ICE May 22, 2022 at 17:19

@wizzwizz4: Nothing to do with linker. It's just how you write a call with zero displacement in asm mnemonics.
– R.. GitHub STOP HELPING ICE May 22, 2022 at 17:21

Highly active question. Earn 10 reputation (not counting the association bonus) in order to answer this question. The
reputation requirement helps protect this question from spam and non-answer activity.

You might also like