Print this page
15254 %ymm registers not restored after signal handler
15367 x86 getfpregs() summons corrupting %xmm ghosts
15333 want x86 /proc xregs support (libc_db, libproc, mdb, etc.)
15336 want libc functions for extended ucontext_t
15334 want ps_lwphandle-specific reg routines
15328 FPU_CW_INIT mistreats reserved bit
15335 i86pc fpu_subr.c isn't really platform-specific
15332 setcontext(2) isn't actually noreturn
15331 need <sys/stdalign.h>
Change-Id: I7060aa86042dfb989f77fc3323c065ea2eafa9ad
Conflicts:
    usr/src/uts/common/fs/proc/prcontrol.c
    usr/src/uts/intel/os/archdep.c
    usr/src/uts/intel/sys/ucontext.h
    usr/src/uts/intel/syscall/getcontext.c

Split Close
Expand all
Collapse all
          --- old/usr/src/man/man5/proc.5.man.txt
          +++ new/usr/src/man/man5/proc.5.man.txt
↓ open down ↓ 1246 lines elided ↑ open up ↑
1247 1247       correspond to the CPU time for the process, but rather to the creation
1248 1248       time of the agent lwp.
1249 1249  
1250 1250     templates
1251 1251       A directory which contains references to the active templates for the
1252 1252       lwp, named by the contract type.  Changes made to an active template
1253 1253       descriptor do not affect the original template which was activated,
1254 1254       though they do affect the active template.  It is not possible to
1255 1255       activate an active template descriptor.  See contract(5).
1256 1256  
     1257 +ARCHITECTURE-SPECIFIC STRUCTURES
     1258 +   x86
     1259 +     The x86 prxregset_t structure is opaque and is made up of several
     1260 +     different components due to the fact that different x86 processors
     1261 +     enumerate different architectural extensions.
     1262 +
     1263 +     The structure begins with a header, the prxregset_hdr_t, which is
     1264 +     followed by a number of different information sections which describe
     1265 +     different possible extended registers.  Each of those is covered by a
     1266 +     prxregset_info_t, and then finally there are different data payloads that
     1267 +     represent each extended register.
     1268 +
     1269 +     The number of different informational entries varies from system to
     1270 +     system based on the set of architectural features that the system
     1271 +     supports and the corresponding OS enablement for them.  This structure is
     1272 +     built around the idea of the x86 xsave structure.  That is, there is a
     1273 +     central header which describes a bit-vector of what extended features are
     1274 +     present and have valid state.
     1275 +
     1276 +     Each x86 xregs file begins with the prxregset_hdr_t which looks like:
     1277 +
     1278 +      typedef struct prxregset_hdr {
     1279 +              uint32_t        pr_type;
     1280 +              uint32_t        pr_size;
     1281 +              uint32_t        pr_flags;
     1282 +              uint32_t        pr_pad[4];
     1283 +              uint32_t        pr_ninfo;
     1284 +              prxregset_info_t pr_info[];
     1285 +      } prxregset_hdr_t;
     1286 +
     1287 +     The pr_type member is always set to PR_TYPE_XSAVE.  This is used to
     1288 +     indicate the type of file that is present.  There may be different file
     1289 +     types in the future on x86 so this value should always be checked.  If it
     1290 +     is not PR_TYPE_XSAVE then the rest of the structure may look different.
     1291 +     The pr_size member indicates the size in bytes of the overall structure.
     1292 +     The pr_flags and pr_pad values are currently reserved for future use.
     1293 +     They will be set to zero right now when read and must be set to zero when
     1294 +     writing the data.  The pr_ninfo member indicates the number of
     1295 +     informational items are present in pr_info. There will be one
     1296 +     informational item for each register set that exists.
     1297 +
     1298 +     The pr_info member points to an array of informational members.  These
     1299 +     immediately follow the structure, though the pr_info member may not be
     1300 +     available directly if not in an environment compatible with some C99
     1301 +     features.  Each prxregset_info_t structure looks like:
     1302 +
     1303 +      typedef struct prxregset_info {
     1304 +              uint32_t pri_type;
     1305 +              uint32_t pri_flags;
     1306 +              uint32_t pri_size;
     1307 +              uint32_t pri_offset;
     1308 +      } prxregset_info_t;
     1309 +
     1310 +     The pri_type member is used to indicate the type of data and its format
     1311 +     that this represents.  Types are listed below.  The pri_flags member is
     1312 +     used to indicate future extensions or information about these items.
     1313 +     Right now, these are all zero.  The pri_size member indicates the size in
     1314 +     bytes of the type's data.  The pri_offset member indicates the offset to
     1315 +     the start of the data section from the beginning of the xregs file.  That
     1316 +     is an offset of 0 would be the first byte of the prxregset_hdr_t.
     1317 +
     1318 +     The following types of structures and their corresponding data structures
     1319 +     are currently defined:
     1320 +
     1321 +     PRX_INFO_XCR - prxregset_xcr_t
     1322 +             This structure provides read-only access to understanding the
     1323 +             CPU's settings for this thread.  In particular, it lets you see
     1324 +             what is set in the x86 %xcr0 register which is the extended
     1325 +             feature control register and controls what extended features the
     1326 +             CPU actually uses.  It also contains the x86 extended feature
     1327 +             disable MSR which controls features that are ignored.  The
     1328 +             prxregset_xcr_t looks like:
     1329 +
     1330 +                    typedef struct prxregset_xcr {
     1331 +                            uint64_t        prx_xcr_xcr0;
     1332 +                            uint64_t        prx_xcr_xfd;
     1333 +                            uint64_t        prx_xcr_pad[2];
     1334 +                    } prxregset_xcr_t;
     1335 +
     1336 +             When setting the xregs, this entry can be left out.  If it is
     1337 +             included, it must match the existing entries, otherwise an error
     1338 +             will be generated.
     1339 +
     1340 +     PRX_INFO_XSAVE - prxregset_xsave_t
     1341 +             This structure represents the same as the actual Intel xsave
     1342 +             structure, which has both the traditional XMM state that comes
     1343 +             from the fxsave instruction and then also contains the xsave
     1344 +             header itself.  The structure varies between 32-bit and 64-bit
     1345 +             applications.  The structure itself looks like:
     1346 +
     1347 +             typedef struct prxregset_xsave {
     1348 +                     uint16_t        prx_fx_fcw;
     1349 +                     uint16_t        prx_fx_fsw;
     1350 +                     uint16_t        prx_fx_fctw;    /* compressed tag word */
     1351 +                     uint16_t        prx_fx_fop;
     1352 +             #if defined(__amd64)
     1353 +                     uint64_t        prx_fx_rip;
     1354 +                     uint64_t        prx_fx_rdp;
     1355 +             #else
     1356 +                     uint32_t        prx_fx_eip;
     1357 +                     uint16_t        prx_fx_cs;
     1358 +                     uint16_t        __prx_fx_ign0;
     1359 +                     uint32_t        prx_fx_dp;
     1360 +                     uint16_t        prx_fx_ds;
     1361 +                     uint16_t        __prx_fx_ign1;
     1362 +             #endif
     1363 +                     uint32_t        prx_fx_mxcsr;
     1364 +                     uint32_t        prx_fx_mxcsr_mask;
     1365 +                     union {
     1366 +                             uint16_t prx_fpr_16[5]; /* 80-bits of x87 state */
     1367 +                             u_longlong_t prx_fpr_mmx;       /* 64-bit mmx register */
     1368 +                             uint32_t _prx__fpr_pad[4];      /* (pad out to 128-bits) */
     1369 +                     } fx_st[8];
     1370 +             #if defined(__amd64)
     1371 +                     upad128_t       prx_fx_xmm[16]; /* 128-bit registers */
     1372 +                     upad128_t       __prx_fx_ign2[6];
     1373 +             #else
     1374 +                     upad128_t       prx_fx_xmm[8];  /* 128-bit registers */
     1375 +                     upad128_t       __prx_fx_ign2[14];
     1376 +             #endif
     1377 +                     uint64_t        prx_xsh_xstate_bv;
     1378 +                     uint64_t        prx_xsh_xcomp_bv;
     1379 +                     uint64_t        prx_xsh_reserved[6];
     1380 +             } prxregset_xsave_t;
     1381 +
     1382 +             In the classical fxsave portion of the structure, most of the
     1383 +             members follow the same meaning and match their presence in the
     1384 +             fpregs file and their use as discussed in the Intel and AMD
     1385 +             software developer manuals.  The one exception is that when
     1386 +             setting the prx_fx_mxcsr member reserved bits that are set will
     1387 +             be masked off and ignored.
     1388 +
     1389 +             The most notable fields to consider here right now are the last
     1390 +             few members which are part of the xsave header itself.  In
     1391 +             particular, the prx_xsh_xstate_bv component is used to track the
     1392 +             actual features whose content are valid.  When reading the
     1393 +             registers, if a given entry is not valid, the register state will
     1394 +             write out the informational entry in its default state.  When
     1395 +             setting the extended registers, this notes which features will be
     1396 +             loaded from their default state (as defined by Intel and AMD's
     1397 +             manuals) and which will be loaded from the informational entries.
     1398 +             If a bit is set in the prx_xsh_xstate_bv entry, then it must be
     1399 +             present as its own informational entry otherwise a write will
     1400 +             fail.  If an informational entry is present in a write, but not
     1401 +             set in the prx_xsh_xstate_bv then its contents will be ignored.
     1402 +
     1403 +             The xregs format currently does not support any compressed items
     1404 +             being specified nor does it specify any, so the prx_xsh_xcomp_bv
     1405 +             member will be always set to zero and it and the reserved members
     1406 +             prx_xsh_reserved must all be left as zero.
     1407 +
     1408 +     PRX_INFO_YMM - prxregset_ymm_t
     1409 +             This structure contains the upper 128-bits of the first 16 %ymm
     1410 +             registers (8 for 32-bit applications).  To construct a full
     1411 +             vector register, it must be combined with the prx_fx_xmm member
     1412 +             of the PRX_INFO_XSAVE data.  In 32-bit applications, the reserved
     1413 +             registers must be written as zero.  The structure itself looks
     1414 +             like:
     1415 +
     1416 +              typedef struct prxregset_ymm {
     1417 +              #if defined(__amd64)
     1418 +                      upad128_t       prx_ymm[16];
     1419 +              #else
     1420 +                      upad128_t       prx_ymm[8];
     1421 +                      upad128_t       prx_rsvd[8];
     1422 +              #endif
     1423 +              } prxregset_ymm_t;
     1424 +
     1425 +     PRX_INFO_OPMASK - prxregset_opmask_t
     1426 +             This structure represents one portion of Intel's AVX-512 state:
     1427 +             the 8 64-bit mask registers, %k0 through %k7.  The structure
     1428 +             looks like:
     1429 +
     1430 +              typedef struct prxregset_opmask {
     1431 +                      uint64_t        prx_opmask[8];
     1432 +              } prxregset_opmask_t;
     1433 +
     1434 +     PRX_INFO_ZMM - prxregset_zmm_t
     1435 +             This structure represents one portion of Intel's AVX-512 state:
     1436 +             the upper 256 bits of the 512-bit %zmm0 through %zmm15 registers.
     1437 +             Bits 0-127 are found in the prx_fx_xmm member of the
     1438 +             PRX_INFO_XSAVE data and bits 128-255 are found in the prx_ymm
     1439 +             member of the PRX_INFO_YMM.  32-bit applications only have access
     1440 +             to %zmm0 through %zmm7.  This structure looks like:
     1441 +
     1442 +              typedef struct prxregset_zmm {
     1443 +              #if defined(__amd64)
     1444 +                      upad256_t       prx_zmm[16];
     1445 +              #else
     1446 +                      upad256_t       prx_zmm[8];
     1447 +                      upad256_t       prx_rsvd[8];
     1448 +              #endif
     1449 +              } prxregset_zmm_t;
     1450 +
     1451 +     PRX_INFO_HI_ZMM - prxregset_hi_zmm_t
     1452 +             This structure represents the third portion of Intel's AVX-512
     1453 +             state: the additional 16 512-bit registers that are available to
     1454 +             64-bit applications, but not 32-bit applications.  This
     1455 +             represents %zmm16 through %zmm31.  This structure looks like:
     1456 +
     1457 +                   typedef struct prxregset_hi_zmm {
     1458 +                   #if defined(__amd64)
     1459 +                           upad512_t       prx_hi_zmm[16];
     1460 +                   #else
     1461 +                           upad512_t       prx_rsvd[16];
     1462 +                   #endif
     1463 +                   } prxregset_hi_zmm_t;
     1464 +
     1465 +                   Unlike the other lower %zmm registers of %zmm0 through %zmm15, this contains the
     1466 +                   entire 512-bit register in one spot and there is no need to look at other
     1467 +                   information items to reconstitute the entire vector.
     1468 +
     1469 +             When setting the extended registers, at least the PRX_INFO_XSAVE
     1470 +             component must be present.  None of the component offsets may
     1471 +             overlap with the prxregset_hdr_t or any of the prxregset_info_t
     1472 +             structures.  In the written data file, it is expected that the
     1473 +             various structures start with their naturally expected alignment,
     1474 +             which is most often 16 bytes (that is the value that the C
     1475 +             alignof() keyword will return).  The structures that we use are
     1476 +             all multiples of 16 bytes to make this easier.  The kernel will
     1477 +             write out structures with a greater alignment such that the
     1478 +             portions of registers are aligned and safely usable with
     1479 +             instructions that move aligned integers such as vmovdqu64.
     1480 +
1257 1481  CONTROL MESSAGES
1258 1482       Process state changes are effected through messages written to a
1259 1483       process's ctl file or to an individual lwp's lwpctl file.  All control
1260 1484       messages consist of a long that names the specific operation followed by
1261 1485       additional data containing the operand, if any.
1262 1486  
1263 1487       Multiple control messages may be combined in a single write(2) (or
1264 1488       writev(2)) to a control file, but no partial writes are permitted.  That
1265 1489       is, each control message, operation code plus operand, if any, must be
1266 1490       presented in its entirety to the write(2) and not in pieces over several
↓ open down ↓ 3 lines elided ↑ open up ↑
1270 1494       Descriptions of the allowable control messages follow.  In all cases,
1271 1495       writing a message to a control file for a process or lwp that has
1272 1496       terminated elicits the error ENOENT.
1273 1497  
1274 1498     PCSTOP PCDSTOP PCWSTOP PCTWSTOP
1275 1499       When applied to the process control file, PCSTOP directs all lwps to stop
1276 1500       and waits for them to stop, PCDSTOP directs all lwps to stop without
1277 1501       waiting for them to stop, and PCWSTOP simply waits for all lwps to stop.
1278 1502       When applied to an lwp control file, PCSTOP directs the specific lwp to
1279 1503       stop and waits until it has stopped, PCDSTOP directs the specific lwp to
1280      -     stop without waiting for it to stop, and PCWSTOP
1281      -      simply waits for the specific lwp to stop.  When applied to an lwp
1282      -     control file, PCSTOP and PCWSTOP complete when the lwp stops on an event
1283      -     of interest, immediately if already so stopped; when applied to the
1284      -     process control file, they complete when every lwp has stopped either on
1285      -     an event of interest or on a PR_SUSPENDED stop.
     1504 +     stop without waiting for it to stop, and PCWSTOP simply waits for the
     1505 +     specific lwp to stop.  When applied to an lwp control file, PCSTOP and
     1506 +     PCWSTOP complete when the lwp stops on an event of interest, immediately
     1507 +     if already so stopped; when applied to the process control file, they
     1508 +     complete when every lwp has stopped either on an event of interest or on
     1509 +     a PR_SUSPENDED stop.
1286 1510  
1287 1511       PCTWSTOP is identical to PCWSTOP except that it enables the operation to
1288 1512       time out, to avoid waiting forever for a process or lwp that may never
1289 1513       stop on an event of interest.  PCTWSTOP takes a long operand specifying a
1290 1514       number of milliseconds; the wait will terminate successfully after the
1291 1515       specified number of milliseconds even if the process or lwp has not
1292 1516       stopped; a timeout value of zero makes the operation identical to
1293 1517       PCWSTOP.
1294 1518  
1295 1519       An "event of interest" is either a PR_REQUESTED stop or a stop that has
1296 1520       been specified in the process's tracing flags (set by PCSTRACE, PCSFAULT,
1297      -     PCSENTRY, and PCSEXIT).  PR_JOBCONTROL
1298      -      and PR_SUSPENDED stops are specifically not events of interest.  (An lwp
1299      -     may stop twice due to a stop signal, first showing PR_SIGNALLED if the
1300      -     signal is traced and again showing PR_JOBCONTROL if the lwp is set
1301      -     running without clearing the signal.)  If PCSTOP or PCDSTOP is applied to
1302      -     an lwp that is stopped, but not on an event of interest, the stop
1303      -     directive takes effect when the lwp is restarted by the competing
1304      -     mechanism.  At that time, the lwp enters a PR_REQUESTED stop before
1305      -     executing any user-level code.
     1521 +     PCSENTRY, and PCSEXIT).  PR_JOBCONTROL and PR_SUSPENDED stops are
     1522 +     specifically not events of interest.  (An lwp may stop twice due to a
     1523 +     stop signal, first showing PR_SIGNALLED if the signal is traced and again
     1524 +     showing PR_JOBCONTROL if the lwp is set running without clearing the
     1525 +     signal.)  If PCSTOP or PCDSTOP is applied to an lwp that is stopped, but
     1526 +     not on an event of interest, the stop directive takes effect when the lwp
     1527 +     is restarted by the competing mechanism.  At that time, the lwp enters a
     1528 +     PR_REQUESTED stop before executing any user-level code.
1306 1529  
1307 1530       A write of a control message that blocks is interruptible by a signal so
1308 1531       that, for example, an alarm(2) can be set to avoid waiting forever for a
1309 1532       process or lwp that may never stop on an event of interest.  If PCSTOP is
1310 1533       interrupted, the lwp stop directives remain in effect even though the
1311 1534       write(2) returns an error.  (Use of PCTWSTOP with a non-zero timeout is
1312 1535       recommended over PCWSTOP with an alarm(2).)
1313 1536  
1314 1537       A system process (indicated by the PR_ISSYS flag) never executes at user
1315 1538       level, has no user-level address space visible through /proc, and cannot
↓ open down ↓ 353 lines elided ↑ open up ↑
1669 1892       according to the operand prfpregset_t structure.  An error (EINVAL) is
1670 1893       returned if the system does not support floating-point operations (no
1671 1894       floating-point hardware and the system does not emulate floating-point
1672 1895       machine instructions).  PCSFPREG fails with EBUSY if the lwp is not
1673 1896       stopped on an event of interest.
1674 1897  
1675 1898     PCSXREG
1676 1899       Set the extra state registers for the specific or representative lwp
1677 1900       according to the architecture-dependent operand prxregset_t structure.
1678 1901       An error (EINVAL) is returned if the system does not support extra state
1679      -     registers.  PCSXREG fails with EBUSY if the lwp is not stopped on an
1680      -     event of interest.
     1902 +     registers or the register state is invalid.  PCSXREG fails with EBUSY if
     1903 +     the lwp is not stopped on an event of interest.
1681 1904  
1682 1905     PCSASRS
1683 1906       Set the ancillary state registers for the specific or representative lwp
1684 1907       according to the SPARC V9 platform-dependent operand asrset_t structure.
1685 1908       An error (EINVAL) is returned if either the target process or the
1686 1909       controlling process is not a 64-bit SPARC V9 process.  Most of the
1687 1910       ancillary state registers are privileged registers that cannot be
1688 1911       modified.  Only those that can be modified are set; all others are
1689 1912       silently ignored.  PCSASRS fails with EBUSY if the lwp is not stopped on
1690 1913       an event of interest.
↓ open down ↓ 366 lines elided ↑ open up ↑
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX