Print this page
15254 %ymm registers not restored after signal handler
15367 x86 getfpregs() summons corrupting %xmm ghosts
15333 want x86 /proc xregs support (libc_db, libproc, mdb, etc.)
15336 want libc functions for extended ucontext_t
15334 want ps_lwphandle-specific reg routines
15328 FPU_CW_INIT mistreats reserved bit
15335 i86pc fpu_subr.c isn't really platform-specific
15332 setcontext(2) isn't actually noreturn
15331 need <sys/stdalign.h>
Change-Id: I7060aa86042dfb989f77fc3323c065ea2eafa9ad
Conflicts:
    usr/src/uts/common/fs/proc/prcontrol.c
    usr/src/uts/intel/os/archdep.c
    usr/src/uts/intel/sys/ucontext.h
    usr/src/uts/intel/syscall/getcontext.c


1237      asrset_t structure, defined in <sys/regset.h>, containing the values of
1238      the lwp's platform-dependent ancillary state registers.  If the lwp is
1239      not stopped, all register values are undefined.  See also the PCSASRS
1240      control operation, below.
1241 
1242    spymaster
1243      For an agent lwp (see PCAGENT), this file contains a psinfo_t structure
1244      that corresponds to the process that created the agent lwp at the time
1245      the agent was created.  This structure is identical to that retrieved via
1246      the psinfo file, with one modification: the pr_time field does not
1247      correspond to the CPU time for the process, but rather to the creation
1248      time of the agent lwp.
1249 
1250    templates
1251      A directory which contains references to the active templates for the
1252      lwp, named by the contract type.  Changes made to an active template
1253      descriptor do not affect the original template which was activated,
1254      though they do affect the active template.  It is not possible to
1255      activate an active template descriptor.  See contract(5).
1256 
































































































































































































































1257 CONTROL MESSAGES
1258      Process state changes are effected through messages written to a
1259      process's ctl file or to an individual lwp's lwpctl file.  All control
1260      messages consist of a long that names the specific operation followed by
1261      additional data containing the operand, if any.
1262 
1263      Multiple control messages may be combined in a single write(2) (or
1264      writev(2)) to a control file, but no partial writes are permitted.  That
1265      is, each control message, operation code plus operand, if any, must be
1266      presented in its entirety to the write(2) and not in pieces over several
1267      system calls.  If a control operation fails, no subsequent operations
1268      contained in the same write(2) are attempted.
1269 
1270      Descriptions of the allowable control messages follow.  In all cases,
1271      writing a message to a control file for a process or lwp that has
1272      terminated elicits the error ENOENT.
1273 
1274    PCSTOP PCDSTOP PCWSTOP PCTWSTOP
1275      When applied to the process control file, PCSTOP directs all lwps to stop
1276      and waits for them to stop, PCDSTOP directs all lwps to stop without
1277      waiting for them to stop, and PCWSTOP simply waits for all lwps to stop.
1278      When applied to an lwp control file, PCSTOP directs the specific lwp to
1279      stop and waits until it has stopped, PCDSTOP directs the specific lwp to
1280      stop without waiting for it to stop, and PCWSTOP
1281       simply waits for the specific lwp to stop.  When applied to an lwp
1282      control file, PCSTOP and PCWSTOP complete when the lwp stops on an event
1283      of interest, immediately if already so stopped; when applied to the
1284      process control file, they complete when every lwp has stopped either on
1285      an event of interest or on a PR_SUSPENDED stop.
1286 
1287      PCTWSTOP is identical to PCWSTOP except that it enables the operation to
1288      time out, to avoid waiting forever for a process or lwp that may never
1289      stop on an event of interest.  PCTWSTOP takes a long operand specifying a
1290      number of milliseconds; the wait will terminate successfully after the
1291      specified number of milliseconds even if the process or lwp has not
1292      stopped; a timeout value of zero makes the operation identical to
1293      PCWSTOP.
1294 
1295      An "event of interest" is either a PR_REQUESTED stop or a stop that has
1296      been specified in the process's tracing flags (set by PCSTRACE, PCSFAULT,
1297      PCSENTRY, and PCSEXIT).  PR_JOBCONTROL
1298       and PR_SUSPENDED stops are specifically not events of interest.  (An lwp
1299      may stop twice due to a stop signal, first showing PR_SIGNALLED if the
1300      signal is traced and again showing PR_JOBCONTROL if the lwp is set
1301      running without clearing the signal.)  If PCSTOP or PCDSTOP is applied to
1302      an lwp that is stopped, but not on an event of interest, the stop
1303      directive takes effect when the lwp is restarted by the competing
1304      mechanism.  At that time, the lwp enters a PR_REQUESTED stop before
1305      executing any user-level code.
1306 
1307      A write of a control message that blocks is interruptible by a signal so
1308      that, for example, an alarm(2) can be set to avoid waiting forever for a
1309      process or lwp that may never stop on an event of interest.  If PCSTOP is
1310      interrupted, the lwp stop directives remain in effect even though the
1311      write(2) returns an error.  (Use of PCTWSTOP with a non-zero timeout is
1312      recommended over PCWSTOP with an alarm(2).)
1313 
1314      A system process (indicated by the PR_ISSYS flag) never executes at user
1315      level, has no user-level address space visible through /proc, and cannot
1316      be stopped.  Applying one of these operations to a system process or any
1317      of its lwps elicits the error EBUSY.
1318 
1319    PCRUN
1320      Make an lwp runnable again after a stop.  This operation takes a long
1321      operand containing zero or more of the following flags:
1322 
1323          PRCSIG    clears the current signal, if any (see PCCSIG).
1324 
1325          PRCFAULT  clears the current fault, if any (see PCCFAULT).


1659 
1660    PCSVADDR
1661      Set the address at which execution will resume for the specific or
1662      representative lwp from the operand long.  On SPARC based systems, both
1663      %pc and %npc are set, with %npc set to the instruction following the
1664      virtual address.  On x86-based systems, only %eip is set.  PCSVADDR fails
1665      with EBUSY if the lwp is not stopped on an event of interest.
1666 
1667    PCSFPREG
1668      Set the floating-point registers for the specific or representative lwp
1669      according to the operand prfpregset_t structure.  An error (EINVAL) is
1670      returned if the system does not support floating-point operations (no
1671      floating-point hardware and the system does not emulate floating-point
1672      machine instructions).  PCSFPREG fails with EBUSY if the lwp is not
1673      stopped on an event of interest.
1674 
1675    PCSXREG
1676      Set the extra state registers for the specific or representative lwp
1677      according to the architecture-dependent operand prxregset_t structure.
1678      An error (EINVAL) is returned if the system does not support extra state
1679      registers.  PCSXREG fails with EBUSY if the lwp is not stopped on an
1680      event of interest.
1681 
1682    PCSASRS
1683      Set the ancillary state registers for the specific or representative lwp
1684      according to the SPARC V9 platform-dependent operand asrset_t structure.
1685      An error (EINVAL) is returned if either the target process or the
1686      controlling process is not a 64-bit SPARC V9 process.  Most of the
1687      ancillary state registers are privileged registers that cannot be
1688      modified.  Only those that can be modified are set; all others are
1689      silently ignored.  PCSASRS fails with EBUSY if the lwp is not stopped on
1690      an event of interest.
1691 
1692    PCAGENT
1693      Create an agent lwp in the controlled process with register values from
1694      the operand prgregset_t structure (see PCSREG, above).  The agent lwp is
1695      created in the stopped state showing PR_REQUESTED and with its held
1696      signal set (the signal mask) having all signals except SIGKILL and
1697      SIGSTOP blocked.
1698 
1699      The PCAGENT operation fails with EBUSY unless the process is fully
1700      stopped via /proc, that is, unless all of the lwps in the process are




1237      asrset_t structure, defined in <sys/regset.h>, containing the values of
1238      the lwp's platform-dependent ancillary state registers.  If the lwp is
1239      not stopped, all register values are undefined.  See also the PCSASRS
1240      control operation, below.
1241 
1242    spymaster
1243      For an agent lwp (see PCAGENT), this file contains a psinfo_t structure
1244      that corresponds to the process that created the agent lwp at the time
1245      the agent was created.  This structure is identical to that retrieved via
1246      the psinfo file, with one modification: the pr_time field does not
1247      correspond to the CPU time for the process, but rather to the creation
1248      time of the agent lwp.
1249 
1250    templates
1251      A directory which contains references to the active templates for the
1252      lwp, named by the contract type.  Changes made to an active template
1253      descriptor do not affect the original template which was activated,
1254      though they do affect the active template.  It is not possible to
1255      activate an active template descriptor.  See contract(5).
1256 
1257 ARCHITECTURE-SPECIFIC STRUCTURES
1258    x86
1259      The x86 prxregset_t structure is opaque and is made up of several
1260      different components due to the fact that different x86 processors
1261      enumerate different architectural extensions.
1262 
1263      The structure begins with a header, the prxregset_hdr_t, which is
1264      followed by a number of different information sections which describe
1265      different possible extended registers.  Each of those is covered by a
1266      prxregset_info_t, and then finally there are different data payloads that
1267      represent each extended register.
1268 
1269      The number of different informational entries varies from system to
1270      system based on the set of architectural features that the system
1271      supports and the corresponding OS enablement for them.  This structure is
1272      built around the idea of the x86 xsave structure.  That is, there is a
1273      central header which describes a bit-vector of what extended features are
1274      present and have valid state.
1275 
1276      Each x86 xregs file begins with the prxregset_hdr_t which looks like:
1277 
1278       typedef struct prxregset_hdr {
1279               uint32_t        pr_type;
1280               uint32_t        pr_size;
1281               uint32_t        pr_flags;
1282               uint32_t        pr_pad[4];
1283               uint32_t        pr_ninfo;
1284               prxregset_info_t pr_info[];
1285       } prxregset_hdr_t;
1286 
1287      The pr_type member is always set to PR_TYPE_XSAVE.  This is used to
1288      indicate the type of file that is present.  There may be different file
1289      types in the future on x86 so this value should always be checked.  If it
1290      is not PR_TYPE_XSAVE then the rest of the structure may look different.
1291      The pr_size member indicates the size in bytes of the overall structure.
1292      The pr_flags and pr_pad values are currently reserved for future use.
1293      They will be set to zero right now when read and must be set to zero when
1294      writing the data.  The pr_ninfo member indicates the number of
1295      informational items are present in pr_info. There will be one
1296      informational item for each register set that exists.
1297 
1298      The pr_info member points to an array of informational members.  These
1299      immediately follow the structure, though the pr_info member may not be
1300      available directly if not in an environment compatible with some C99
1301      features.  Each prxregset_info_t structure looks like:
1302 
1303       typedef struct prxregset_info {
1304               uint32_t pri_type;
1305               uint32_t pri_flags;
1306               uint32_t pri_size;
1307               uint32_t pri_offset;
1308       } prxregset_info_t;
1309 
1310      The pri_type member is used to indicate the type of data and its format
1311      that this represents.  Types are listed below.  The pri_flags member is
1312      used to indicate future extensions or information about these items.
1313      Right now, these are all zero.  The pri_size member indicates the size in
1314      bytes of the type's data.  The pri_offset member indicates the offset to
1315      the start of the data section from the beginning of the xregs file.  That
1316      is an offset of 0 would be the first byte of the prxregset_hdr_t.
1317 
1318      The following types of structures and their corresponding data structures
1319      are currently defined:
1320 
1321      PRX_INFO_XCR - prxregset_xcr_t
1322              This structure provides read-only access to understanding the
1323              CPU's settings for this thread.  In particular, it lets you see
1324              what is set in the x86 %xcr0 register which is the extended
1325              feature control register and controls what extended features the
1326              CPU actually uses.  It also contains the x86 extended feature
1327              disable MSR which controls features that are ignored.  The
1328              prxregset_xcr_t looks like:
1329 
1330                     typedef struct prxregset_xcr {
1331                             uint64_t        prx_xcr_xcr0;
1332                             uint64_t        prx_xcr_xfd;
1333                             uint64_t        prx_xcr_pad[2];
1334                     } prxregset_xcr_t;
1335 
1336              When setting the xregs, this entry can be left out.  If it is
1337              included, it must match the existing entries, otherwise an error
1338              will be generated.
1339 
1340      PRX_INFO_XSAVE - prxregset_xsave_t
1341              This structure represents the same as the actual Intel xsave
1342              structure, which has both the traditional XMM state that comes
1343              from the fxsave instruction and then also contains the xsave
1344              header itself.  The structure varies between 32-bit and 64-bit
1345              applications.  The structure itself looks like:
1346 
1347              typedef struct prxregset_xsave {
1348                      uint16_t        prx_fx_fcw;
1349                      uint16_t        prx_fx_fsw;
1350                      uint16_t        prx_fx_fctw;    /* compressed tag word */
1351                      uint16_t        prx_fx_fop;
1352              #if defined(__amd64)
1353                      uint64_t        prx_fx_rip;
1354                      uint64_t        prx_fx_rdp;
1355              #else
1356                      uint32_t        prx_fx_eip;
1357                      uint16_t        prx_fx_cs;
1358                      uint16_t        __prx_fx_ign0;
1359                      uint32_t        prx_fx_dp;
1360                      uint16_t        prx_fx_ds;
1361                      uint16_t        __prx_fx_ign1;
1362              #endif
1363                      uint32_t        prx_fx_mxcsr;
1364                      uint32_t        prx_fx_mxcsr_mask;
1365                      union {
1366                              uint16_t prx_fpr_16[5]; /* 80-bits of x87 state */
1367                              u_longlong_t prx_fpr_mmx;       /* 64-bit mmx register */
1368                              uint32_t _prx__fpr_pad[4];      /* (pad out to 128-bits) */
1369                      } fx_st[8];
1370              #if defined(__amd64)
1371                      upad128_t       prx_fx_xmm[16]; /* 128-bit registers */
1372                      upad128_t       __prx_fx_ign2[6];
1373              #else
1374                      upad128_t       prx_fx_xmm[8];  /* 128-bit registers */
1375                      upad128_t       __prx_fx_ign2[14];
1376              #endif
1377                      uint64_t        prx_xsh_xstate_bv;
1378                      uint64_t        prx_xsh_xcomp_bv;
1379                      uint64_t        prx_xsh_reserved[6];
1380              } prxregset_xsave_t;
1381 
1382              In the classical fxsave portion of the structure, most of the
1383              members follow the same meaning and match their presence in the
1384              fpregs file and their use as discussed in the Intel and AMD
1385              software developer manuals.  The one exception is that when
1386              setting the prx_fx_mxcsr member reserved bits that are set will
1387              be masked off and ignored.
1388 
1389              The most notable fields to consider here right now are the last
1390              few members which are part of the xsave header itself.  In
1391              particular, the prx_xsh_xstate_bv component is used to track the
1392              actual features whose content are valid.  When reading the
1393              registers, if a given entry is not valid, the register state will
1394              write out the informational entry in its default state.  When
1395              setting the extended registers, this notes which features will be
1396              loaded from their default state (as defined by Intel and AMD's
1397              manuals) and which will be loaded from the informational entries.
1398              If a bit is set in the prx_xsh_xstate_bv entry, then it must be
1399              present as its own informational entry otherwise a write will
1400              fail.  If an informational entry is present in a write, but not
1401              set in the prx_xsh_xstate_bv then its contents will be ignored.
1402 
1403              The xregs format currently does not support any compressed items
1404              being specified nor does it specify any, so the prx_xsh_xcomp_bv
1405              member will be always set to zero and it and the reserved members
1406              prx_xsh_reserved must all be left as zero.
1407 
1408      PRX_INFO_YMM - prxregset_ymm_t
1409              This structure contains the upper 128-bits of the first 16 %ymm
1410              registers (8 for 32-bit applications).  To construct a full
1411              vector register, it must be combined with the prx_fx_xmm member
1412              of the PRX_INFO_XSAVE data.  In 32-bit applications, the reserved
1413              registers must be written as zero.  The structure itself looks
1414              like:
1415 
1416               typedef struct prxregset_ymm {
1417               #if defined(__amd64)
1418                       upad128_t       prx_ymm[16];
1419               #else
1420                       upad128_t       prx_ymm[8];
1421                       upad128_t       prx_rsvd[8];
1422               #endif
1423               } prxregset_ymm_t;
1424 
1425      PRX_INFO_OPMASK - prxregset_opmask_t
1426              This structure represents one portion of Intel's AVX-512 state:
1427              the 8 64-bit mask registers, %k0 through %k7.  The structure
1428              looks like:
1429 
1430               typedef struct prxregset_opmask {
1431                       uint64_t        prx_opmask[8];
1432               } prxregset_opmask_t;
1433 
1434      PRX_INFO_ZMM - prxregset_zmm_t
1435              This structure represents one portion of Intel's AVX-512 state:
1436              the upper 256 bits of the 512-bit %zmm0 through %zmm15 registers.
1437              Bits 0-127 are found in the prx_fx_xmm member of the
1438              PRX_INFO_XSAVE data and bits 128-255 are found in the prx_ymm
1439              member of the PRX_INFO_YMM.  32-bit applications only have access
1440              to %zmm0 through %zmm7.  This structure looks like:
1441 
1442               typedef struct prxregset_zmm {
1443               #if defined(__amd64)
1444                       upad256_t       prx_zmm[16];
1445               #else
1446                       upad256_t       prx_zmm[8];
1447                       upad256_t       prx_rsvd[8];
1448               #endif
1449               } prxregset_zmm_t;
1450 
1451      PRX_INFO_HI_ZMM - prxregset_hi_zmm_t
1452              This structure represents the third portion of Intel's AVX-512
1453              state: the additional 16 512-bit registers that are available to
1454              64-bit applications, but not 32-bit applications.  This
1455              represents %zmm16 through %zmm31.  This structure looks like:
1456 
1457                    typedef struct prxregset_hi_zmm {
1458                    #if defined(__amd64)
1459                            upad512_t       prx_hi_zmm[16];
1460                    #else
1461                            upad512_t       prx_rsvd[16];
1462                    #endif
1463                    } prxregset_hi_zmm_t;
1464 
1465                    Unlike the other lower %zmm registers of %zmm0 through %zmm15, this contains the
1466                    entire 512-bit register in one spot and there is no need to look at other
1467                    information items to reconstitute the entire vector.
1468 
1469              When setting the extended registers, at least the PRX_INFO_XSAVE
1470              component must be present.  None of the component offsets may
1471              overlap with the prxregset_hdr_t or any of the prxregset_info_t
1472              structures.  In the written data file, it is expected that the
1473              various structures start with their naturally expected alignment,
1474              which is most often 16 bytes (that is the value that the C
1475              alignof() keyword will return).  The structures that we use are
1476              all multiples of 16 bytes to make this easier.  The kernel will
1477              write out structures with a greater alignment such that the
1478              portions of registers are aligned and safely usable with
1479              instructions that move aligned integers such as vmovdqu64.
1480 
1481 CONTROL MESSAGES
1482      Process state changes are effected through messages written to a
1483      process's ctl file or to an individual lwp's lwpctl file.  All control
1484      messages consist of a long that names the specific operation followed by
1485      additional data containing the operand, if any.
1486 
1487      Multiple control messages may be combined in a single write(2) (or
1488      writev(2)) to a control file, but no partial writes are permitted.  That
1489      is, each control message, operation code plus operand, if any, must be
1490      presented in its entirety to the write(2) and not in pieces over several
1491      system calls.  If a control operation fails, no subsequent operations
1492      contained in the same write(2) are attempted.
1493 
1494      Descriptions of the allowable control messages follow.  In all cases,
1495      writing a message to a control file for a process or lwp that has
1496      terminated elicits the error ENOENT.
1497 
1498    PCSTOP PCDSTOP PCWSTOP PCTWSTOP
1499      When applied to the process control file, PCSTOP directs all lwps to stop
1500      and waits for them to stop, PCDSTOP directs all lwps to stop without
1501      waiting for them to stop, and PCWSTOP simply waits for all lwps to stop.
1502      When applied to an lwp control file, PCSTOP directs the specific lwp to
1503      stop and waits until it has stopped, PCDSTOP directs the specific lwp to
1504      stop without waiting for it to stop, and PCWSTOP simply waits for the
1505      specific lwp to stop.  When applied to an lwp control file, PCSTOP and
1506      PCWSTOP complete when the lwp stops on an event of interest, immediately
1507      if already so stopped; when applied to the process control file, they
1508      complete when every lwp has stopped either on an event of interest or on
1509      a PR_SUSPENDED stop.
1510 
1511      PCTWSTOP is identical to PCWSTOP except that it enables the operation to
1512      time out, to avoid waiting forever for a process or lwp that may never
1513      stop on an event of interest.  PCTWSTOP takes a long operand specifying a
1514      number of milliseconds; the wait will terminate successfully after the
1515      specified number of milliseconds even if the process or lwp has not
1516      stopped; a timeout value of zero makes the operation identical to
1517      PCWSTOP.
1518 
1519      An "event of interest" is either a PR_REQUESTED stop or a stop that has
1520      been specified in the process's tracing flags (set by PCSTRACE, PCSFAULT,
1521      PCSENTRY, and PCSEXIT).  PR_JOBCONTROL and PR_SUSPENDED stops are
1522      specifically not events of interest.  (An lwp may stop twice due to a
1523      stop signal, first showing PR_SIGNALLED if the signal is traced and again
1524      showing PR_JOBCONTROL if the lwp is set running without clearing the
1525      signal.)  If PCSTOP or PCDSTOP is applied to an lwp that is stopped, but
1526      not on an event of interest, the stop directive takes effect when the lwp
1527      is restarted by the competing mechanism.  At that time, the lwp enters a
1528      PR_REQUESTED stop before executing any user-level code.

1529 
1530      A write of a control message that blocks is interruptible by a signal so
1531      that, for example, an alarm(2) can be set to avoid waiting forever for a
1532      process or lwp that may never stop on an event of interest.  If PCSTOP is
1533      interrupted, the lwp stop directives remain in effect even though the
1534      write(2) returns an error.  (Use of PCTWSTOP with a non-zero timeout is
1535      recommended over PCWSTOP with an alarm(2).)
1536 
1537      A system process (indicated by the PR_ISSYS flag) never executes at user
1538      level, has no user-level address space visible through /proc, and cannot
1539      be stopped.  Applying one of these operations to a system process or any
1540      of its lwps elicits the error EBUSY.
1541 
1542    PCRUN
1543      Make an lwp runnable again after a stop.  This operation takes a long
1544      operand containing zero or more of the following flags:
1545 
1546          PRCSIG    clears the current signal, if any (see PCCSIG).
1547 
1548          PRCFAULT  clears the current fault, if any (see PCCFAULT).


1882 
1883    PCSVADDR
1884      Set the address at which execution will resume for the specific or
1885      representative lwp from the operand long.  On SPARC based systems, both
1886      %pc and %npc are set, with %npc set to the instruction following the
1887      virtual address.  On x86-based systems, only %eip is set.  PCSVADDR fails
1888      with EBUSY if the lwp is not stopped on an event of interest.
1889 
1890    PCSFPREG
1891      Set the floating-point registers for the specific or representative lwp
1892      according to the operand prfpregset_t structure.  An error (EINVAL) is
1893      returned if the system does not support floating-point operations (no
1894      floating-point hardware and the system does not emulate floating-point
1895      machine instructions).  PCSFPREG fails with EBUSY if the lwp is not
1896      stopped on an event of interest.
1897 
1898    PCSXREG
1899      Set the extra state registers for the specific or representative lwp
1900      according to the architecture-dependent operand prxregset_t structure.
1901      An error (EINVAL) is returned if the system does not support extra state
1902      registers or the register state is invalid.  PCSXREG fails with EBUSY if
1903      the lwp is not stopped on an event of interest.
1904 
1905    PCSASRS
1906      Set the ancillary state registers for the specific or representative lwp
1907      according to the SPARC V9 platform-dependent operand asrset_t structure.
1908      An error (EINVAL) is returned if either the target process or the
1909      controlling process is not a 64-bit SPARC V9 process.  Most of the
1910      ancillary state registers are privileged registers that cannot be
1911      modified.  Only those that can be modified are set; all others are
1912      silently ignored.  PCSASRS fails with EBUSY if the lwp is not stopped on
1913      an event of interest.
1914 
1915    PCAGENT
1916      Create an agent lwp in the controlled process with register values from
1917      the operand prgregset_t structure (see PCSREG, above).  The agent lwp is
1918      created in the stopped state showing PR_REQUESTED and with its held
1919      signal set (the signal mask) having all signals except SIGKILL and
1920      SIGSTOP blocked.
1921 
1922      The PCAGENT operation fails with EBUSY unless the process is fully
1923      stopped via /proc, that is, unless all of the lwps in the process are