Print this page
NEX-16819 loader UEFI support
Includes work by Toomas Soome <tsoome@me.com>
Upstream commits:
loader: pxe receive cleanup
9475 libefi: Do not return only if ReceiveFilter
installboot: should support efi system partition
8931 boot1.efi: scan all display modes rather than
loader: spinconsole updates
loader: gfx experiment to try GOP Blt() function.
sha1 build test
loader: add sha1 hash calculation
common/sha1: update for loader build
loader: biosdisk rework
uts: 32-bit kernel FB needs mapping in low memory
uts: add diag-device
uts: boot console mirror with diag-device
uts: enable very early console on ttya
kmdb: add diag-device as input/output device
uts: test VGA memory exclusion from mapping
uts: clear boot mapping and protect boot pages test
uts: add dboot map debug printf
uts: need to release FB pages in release_bootstrap()
uts: add screenmap ioctl
uts: update sys/queue.h
loader: add illumos uts/common to include path
loader: tem/gfx font cleanup
loader: vbe checks
uts: gfx_private set KD_TEXT when KD_RESETTEXT is
uts: gfx 8-bit update
loader: gfx 8-bit fix
loader: always set media size from partition.
uts: MB2 support for 32-bit kernel
loader: x86 should have tem 80x25
uts: x86 should have tem 80x25
uts: font update
loader: font update
uts: tem attributes
loader: tem.c comment added
uts: use font module
loader: add font module
loader: build rules for new font setup
uts: gfx_private update for new font structure
uts: early boot update for new font structure
uts: font update
uts: font build rules update for new fonts
uts: tem update to new font structure
loader: module.c needs to include tem_impl.h
uts: gfx_private 8x16 font rework
uts: make font_lookup public
loader: font rework
uts: font rework
9259 libefi: efi_alloc_and_read should check for PMBR
uts: tem utf-8 support
loader: implement tem utf-8 support
loader: tem should be able to display UTF-8
7784 uts: console input should support utf-8
7796 uts: ldterm default to utf-8
uts: do not reset serial console
uts: set up colors even if tem is not console
uts: add type for early boot properties
uts: gfx_private experiment with drm and vga
uts: gfx_private should use setmode drm callback.
uts: identify FB types and set up gfx_private based
loader: replace gop and vesa with framebuffer
uts: boot needs simple tem to support mdb
uts: boot_keyboard should emit esc sequences for
uts: gfx_private FB showuld be written by line
kmdb: set terminal window size
uts: gfx_private needs to keep track of early boot FB
pnglite: move pnglite to usr/src/common
loader: gfx_fb
ficl-sys: add gfx primitives
loader: add illumos.png logo
ficl: add fb-putimage
loader: add png support
loader: add alpha blending for gfx_fb
loader: use term-drawrect for menu frame
ficl: add simple gfx words
uts: provide fb_info via fbgattr dev_specific array.
uts: gfx_private add alpha blending
uts: update sys/ascii.h
uts: tem OSC support (incomplete)
uts: implement env module support and use data from
uts: tem get colors from early boot data
loader: use crc32 from libstand (libz)
loader: optimize for size
loader: pass tem info to the environment
loader: import tem for loader console
loader: UEFI loader needs to set ISADIR based on
loader: need UEFI32 support
8918 loader.efi: add vesa edid support
uts: tem_safe_pix_clear_prom_output() should only
uts: tem_safe_pix_clear_entire_screen() should use
uts: tem_safe_check_first_time() should query cursor
uts: tem implement cls callback & visual_io v4
uts: gfx_vgatext use block cursor for vgatext
uts: gfx_private implement cls callback & visual_io
uts: gfx_private bitmap framebuffer implementation
uts: early start frame buffer console support
uts: font functions should check the input char
uts: font rendering should support 16/24/32bit depths
uts: use smallest font as fallback default.
uts: update terminal dimensions based on selected
7834 uts: vgatext should use gfx_private
uts: add spacing property to 8859-1.bdf
terminfo: add underline for sun-color
terminfo: sun-color has 16 colors
uts: add font load callback type
loader: do not repeat int13 calls with error 0x20 and
8905 loader: add skein/edonr support
8904 common/crypto: make skein and edonr loader
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Revert "NEX-16819 loader UEFI support"
This reverts commit ec06b9fc617b99234e538bf2e7e4d02a24993e0c.
Reverting due to failures in the zfs-tests and the sharefs-tests
NEX-16819 loader UEFI support
Includes work by Toomas Soome <tsoome@me.com>
Upstream commits:
loader: pxe receive cleanup
9475 libefi: Do not return only if ReceiveFilter
installboot: should support efi system partition
8931 boot1.efi: scan all display modes rather than
loader: spinconsole updates
loader: gfx experiment to try GOP Blt() function.
sha1 build test
loader: add sha1 hash calculation
common/sha1: update for loader build
loader: biosdisk rework
uts: 32-bit kernel FB needs mapping in low memory
uts: add diag-device
uts: boot console mirror with diag-device
uts: enable very early console on ttya
kmdb: add diag-device as input/output device
uts: test VGA memory exclusion from mapping
uts: clear boot mapping and protect boot pages test
uts: add dboot map debug printf
uts: need to release FB pages in release_bootstrap()
uts: add screenmap ioctl
uts: update sys/queue.h
loader: add illumos uts/common to include path
loader: tem/gfx font cleanup
loader: vbe checks
uts: gfx_private set KD_TEXT when KD_RESETTEXT is
uts: gfx 8-bit update
loader: gfx 8-bit fix
loader: always set media size from partition.
uts: MB2 support for 32-bit kernel
loader: x86 should have tem 80x25
uts: x86 should have tem 80x25
uts: font update
loader: font update
uts: tem attributes
loader: tem.c comment added
uts: use font module
loader: add font module
loader: build rules for new font setup
uts: gfx_private update for new font structure
uts: early boot update for new font structure
uts: font update
uts: font build rules update for new fonts
uts: tem update to new font structure
loader: module.c needs to include tem_impl.h
uts: gfx_private 8x16 font rework
uts: make font_lookup public
loader: font rework
uts: font rework
libefi: efi_alloc_and_read should check for PMBR
uts: tem utf-8 support
loader: implement tem utf-8 support
loader: tem should be able to display UTF-8
7784 uts: console input should support utf-8
7796 uts: ldterm default to utf-8
uts: do not reset serial console
uts: set up colors even if tem is not console
uts: add type for early boot properties
uts: gfx_private experiment with drm and vga
uts: gfx_private should use setmode drm callback.
uts: identify FB types and set up gfx_private based
loader: replace gop and vesa with framebuffer
uts: boot needs simple tem to support mdb
uts: boot_keyboard should emit esc sequences for
uts: gfx_private FB showuld be written by line
kmdb: set terminal window size
uts: gfx_private needs to keep track of early boot FB
pnglite: move pnglite to usr/src/common
loader: gfx_fb
ficl-sys: add gfx primitives
loader: add illumos.png logo
ficl: add fb-putimage
loader: add png support
loader: add alpha blending for gfx_fb
loader: use term-drawrect for menu frame
ficl: add simple gfx words
uts: provide fb_info via fbgattr dev_specific array.
uts: gfx_private add alpha blending
uts: update sys/ascii.h
uts: tem OSC support (incomplete)
uts: implement env module support and use data from
uts: tem get colors from early boot data
loader: use crc32 from libstand (libz)
loader: optimize for size
loader: pass tem info to the environment
loader: import tem for loader console
loader: UEFI loader needs to set ISADIR based on
loader: need UEFI32 support
8918 loader.efi: add vesa edid support
uts: tem_safe_pix_clear_prom_output() should only
uts: tem_safe_pix_clear_entire_screen() should use
uts: tem_safe_check_first_time() should query cursor
uts: tem implement cls callback & visual_io v4
uts: gfx_vgatext use block cursor for vgatext
uts: gfx_private implement cls callback & visual_io
uts: gfx_private bitmap framebuffer implementation
uts: early start frame buffer console support
uts: font functions should check the input char
uts: font rendering should support 16/24/32bit depths
uts: use smallest font as fallback default.
uts: update terminal dimensions based on selected
7834 uts: vgatext should use gfx_private
uts: add spacing property to 8859-1.bdf
terminfo: add underline for sun-color
terminfo: sun-color has 16 colors
uts: add font load callback type
loader: do not repeat int13 calls with error 0x20 and
8905 loader: add skein/edonr support
8904 common/crypto: make skein and edonr loader
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
NEX-14317 HVM with more than 2 VCPUs hangs on Xen 4.7
Reviewed by: Alex Deiter <alex.deiter@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
re #13613 rb4516 Tunables needs volatile keyword
re #13140 rb4270 hvm_sd module missing dependencies on scsi and cmlb
re #13166 rb4270 Check for Xen HVM even if CPUID signature returns Microsoft Hv
re #13187 rb4270 Fix Xen HVM related warnings
re #11780 rb3700 Improve hypervisor environment detection
| Split |
Close |
| Expand all |
| Collapse all |
--- old/usr/src/uts/i86pc/os/startup.c
+++ new/usr/src/uts/i86pc/os/startup.c
1 1 /*
2 2 * CDDL HEADER START
3 3 *
4 4 * The contents of this file are subject to the terms of the
5 5 * Common Development and Distribution License (the "License").
6 6 * You may not use this file except in compliance with the License.
7 7 *
8 8 * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
9 9 * or http://www.opensolaris.org/os/licensing.
10 10 * See the License for the specific language governing permissions
11 11 * and limitations under the License.
12 12 *
13 13 * When distributing Covered Code, include this CDDL HEADER in each
14 14 * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
15 15 * If applicable, add the following below this CDDL HEADER, with the
16 16 * fields enclosed by brackets "[]" replaced with your own identifying
17 17 * information: Portions Copyright [yyyy] [name of copyright owner]
18 18 *
|
↓ open down ↓ |
18 lines elided |
↑ open up ↑ |
19 19 * CDDL HEADER END
20 20 */
21 21
22 22 /*
23 23 * Copyright (c) 1993, 2010, Oracle and/or its affiliates. All rights reserved.
24 24 * Copyright 2012 DEY Storage Systems, Inc. All rights reserved.
25 25 * Copyright 2017 Nexenta Systems, Inc.
26 26 * Copyright 2015 Joyent, Inc.
27 27 * Copyright (c) 2015 by Delphix. All rights reserved.
28 28 */
29 +
29 30 /*
30 31 * Copyright (c) 2010, Intel Corporation.
31 32 * All rights reserved.
32 33 */
33 34
34 35 #include <sys/types.h>
35 36 #include <sys/t_lock.h>
36 37 #include <sys/param.h>
37 38 #include <sys/sysmacros.h>
38 39 #include <sys/signal.h>
39 40 #include <sys/systm.h>
40 41 #include <sys/user.h>
41 42 #include <sys/mman.h>
42 43 #include <sys/vm.h>
43 44 #include <sys/conf.h>
44 45 #include <sys/avintr.h>
45 46 #include <sys/autoconf.h>
46 47 #include <sys/disp.h>
47 48 #include <sys/class.h>
48 49 #include <sys/bitmap.h>
49 50
50 51 #include <sys/privregs.h>
51 52
52 53 #include <sys/proc.h>
53 54 #include <sys/buf.h>
54 55 #include <sys/kmem.h>
55 56 #include <sys/mem.h>
56 57 #include <sys/kstat.h>
57 58
58 59 #include <sys/reboot.h>
59 60
60 61 #include <sys/cred.h>
61 62 #include <sys/vnode.h>
62 63 #include <sys/file.h>
63 64
64 65 #include <sys/procfs.h>
65 66
66 67 #include <sys/vfs.h>
67 68 #include <sys/cmn_err.h>
68 69 #include <sys/utsname.h>
69 70 #include <sys/debug.h>
70 71 #include <sys/kdi.h>
71 72
72 73 #include <sys/dumphdr.h>
73 74 #include <sys/bootconf.h>
74 75 #include <sys/memlist_plat.h>
75 76 #include <sys/varargs.h>
76 77 #include <sys/promif.h>
77 78 #include <sys/modctl.h>
78 79
79 80 #include <sys/sunddi.h>
80 81 #include <sys/sunndi.h>
81 82 #include <sys/ndi_impldefs.h>
82 83 #include <sys/ddidmareq.h>
83 84 #include <sys/psw.h>
84 85 #include <sys/regset.h>
85 86 #include <sys/clock.h>
86 87 #include <sys/pte.h>
87 88 #include <sys/tss.h>
88 89 #include <sys/stack.h>
89 90 #include <sys/trap.h>
90 91 #include <sys/fp.h>
91 92 #include <vm/kboot_mmu.h>
92 93 #include <vm/anon.h>
93 94 #include <vm/as.h>
94 95 #include <vm/page.h>
95 96 #include <vm/seg.h>
96 97 #include <vm/seg_dev.h>
97 98 #include <vm/seg_kmem.h>
98 99 #include <vm/seg_kpm.h>
99 100 #include <vm/seg_map.h>
100 101 #include <vm/seg_vn.h>
101 102 #include <vm/seg_kp.h>
102 103 #include <sys/memnode.h>
103 104 #include <vm/vm_dep.h>
104 105 #include <sys/thread.h>
105 106 #include <sys/sysconf.h>
106 107 #include <sys/vm_machparam.h>
107 108 #include <sys/archsystm.h>
108 109 #include <sys/machsystm.h>
109 110 #include <vm/hat.h>
110 111 #include <vm/hat_i86.h>
111 112 #include <sys/pmem.h>
112 113 #include <sys/smp_impldefs.h>
113 114 #include <sys/x86_archext.h>
114 115 #include <sys/cpuvar.h>
115 116 #include <sys/segments.h>
116 117 #include <sys/clconf.h>
117 118 #include <sys/kobj.h>
|
↓ open down ↓ |
79 lines elided |
↑ open up ↑ |
118 119 #include <sys/kobj_lex.h>
119 120 #include <sys/cpc_impl.h>
120 121 #include <sys/cpu_module.h>
121 122 #include <sys/smbios.h>
122 123 #include <sys/debug_info.h>
123 124 #include <sys/bootinfo.h>
124 125 #include <sys/ddi_periodic.h>
125 126 #include <sys/systeminfo.h>
126 127 #include <sys/multiboot.h>
127 128 #include <sys/ramdisk.h>
129 +#include <sys/framebuffer.h>
128 130
129 131 #ifdef __xpv
130 132
131 133 #include <sys/hypervisor.h>
132 134 #include <sys/xen_mmu.h>
133 135 #include <sys/evtchn_impl.h>
134 136 #include <sys/gnttab.h>
135 137 #include <sys/xpv_panic.h>
136 138 #include <xen/sys/xenbus_comms.h>
137 139 #include <xen/public/physdev.h>
138 140
139 141 extern void xen_late_startup(void);
140 142
141 143 struct xen_evt_data cpu0_evt_data;
142 144
143 145 #else /* __xpv */
144 146 #include <sys/memlist_impl.h>
145 147
146 148 extern void mem_config_init(void);
147 149 #endif /* __xpv */
148 150
149 151 extern void progressbar_init(void);
150 152 extern void brand_init(void);
151 153 extern void pcf_init(void);
152 154 extern void pg_init(void);
153 155 extern void ssp_init(void);
154 156
155 157 extern int size_pse_array(pgcnt_t, int);
156 158
157 159 #if defined(_SOFT_HOSTID)
158 160
159 161 #include <sys/rtc.h>
160 162
161 163 static int32_t set_soft_hostid(void);
162 164 static char hostid_file[] = "/etc/hostid";
163 165
164 166 #endif
165 167
166 168 void *gfx_devinfo_list;
167 169
168 170 #if defined(__amd64) && !defined(__xpv)
169 171 extern void immu_startup(void);
170 172 #endif
171 173
172 174 /*
173 175 * XXX make declaration below "static" when drivers no longer use this
174 176 * interface.
175 177 */
176 178 extern caddr_t p0_va; /* Virtual address for accessing physical page 0 */
177 179
178 180 /*
179 181 * segkp
180 182 */
181 183 extern int segkp_fromheap;
182 184
183 185 static void kvm_init(void);
184 186 static void startup_init(void);
185 187 static void startup_memlist(void);
186 188 static void startup_kmem(void);
187 189 static void startup_modules(void);
188 190 static void startup_vm(void);
189 191 static void startup_end(void);
190 192 static void layout_kernel_va(void);
191 193
192 194 /*
193 195 * Declare these as initialized data so we can patch them.
194 196 */
195 197 #ifdef __i386
196 198
197 199 /*
198 200 * Due to virtual address space limitations running in 32 bit mode, restrict
199 201 * the amount of physical memory configured to a max of PHYSMEM pages (16g).
200 202 *
201 203 * If the physical max memory size of 64g were allowed to be configured, the
202 204 * size of user virtual address space will be less than 1g. A limited user
203 205 * address space greatly reduces the range of applications that can run.
204 206 *
205 207 * If more physical memory than PHYSMEM is required, users should preferably
206 208 * run in 64 bit mode which has far looser virtual address space limitations.
207 209 *
208 210 * If 64 bit mode is not available (as in IA32) and/or more physical memory
209 211 * than PHYSMEM is required in 32 bit mode, physmem can be set to the desired
210 212 * value or to 0 (to configure all available memory) via eeprom(1M). kernelbase
211 213 * should also be carefully tuned to balance out the need of the user
212 214 * application while minimizing the risk of kernel heap exhaustion due to
213 215 * kernelbase being set too high.
214 216 */
215 217 #define PHYSMEM 0x400000
216 218
217 219 #else /* __amd64 */
218 220
219 221 /*
220 222 * For now we can handle memory with physical addresses up to about
221 223 * 64 Terabytes. This keeps the kernel above the VA hole, leaving roughly
222 224 * half the VA space for seg_kpm. When systems get bigger than 64TB this
|
↓ open down ↓ |
85 lines elided |
↑ open up ↑ |
223 225 * code will need revisiting. There is an implicit assumption that there
224 226 * are no *huge* holes in the physical address space too.
225 227 */
226 228 #define TERABYTE (1ul << 40)
227 229 #define PHYSMEM_MAX64 mmu_btop(64 * TERABYTE)
228 230 #define PHYSMEM PHYSMEM_MAX64
229 231 #define AMD64_VA_HOLE_END 0xFFFF800000000000ul
230 232
231 233 #endif /* __amd64 */
232 234
233 -pgcnt_t physmem = PHYSMEM;
235 +volatile pgcnt_t physmem = PHYSMEM;
234 236 pgcnt_t obp_pages; /* Memory used by PROM for its text and data */
235 237
236 238 char *kobj_file_buf;
237 239 int kobj_file_bufsize; /* set in /etc/system */
238 240
239 241 /* Global variables for MP support. Used in mp_startup */
240 242 caddr_t rm_platter_va = 0;
241 243 uint32_t rm_platter_pa;
242 244
243 245 int auto_lpg_disable = 1;
244 246
245 247 /*
246 248 * Some CPUs have holes in the middle of the 64-bit virtual address range.
247 249 */
248 250 uintptr_t hole_start, hole_end;
249 251
250 252 /*
251 253 * kpm mapping window
252 254 */
253 255 caddr_t kpm_vbase;
254 256 size_t kpm_size;
255 257 static int kpm_desired;
256 258 #ifdef __amd64
257 259 static uintptr_t segkpm_base = (uintptr_t)SEGKPM_BASE;
258 260 #endif
259 261
260 262 /*
261 263 * Configuration parameters set at boot time.
262 264 */
263 265
264 266 caddr_t econtig; /* end of first block of contiguous kernel */
265 267
266 268 struct bootops *bootops = 0; /* passed in from boot */
267 269 struct bootops **bootopsp;
268 270 struct boot_syscalls *sysp; /* passed in from boot */
269 271
270 272 char bootblock_fstype[16];
271 273
272 274 char kern_bootargs[OBP_MAXPATHLEN];
273 275 char kern_bootfile[OBP_MAXPATHLEN];
274 276
275 277 /*
276 278 * ZFS zio segment. This allows us to exclude large portions of ZFS data that
277 279 * gets cached in kmem caches on the heap. If this is set to zero, we allocate
278 280 * zio buffers from their own segment, otherwise they are allocated from the
279 281 * heap. The optimization of allocating zio buffers from their own segment is
280 282 * only valid on 64-bit kernels.
281 283 */
282 284 #if defined(__amd64)
283 285 int segzio_fromheap = 0;
284 286 #else
285 287 int segzio_fromheap = 1;
286 288 #endif
287 289
288 290 /*
289 291 * Give folks an escape hatch for disabling SMAP via kmdb. Doesn't work
290 292 * post-boot.
291 293 */
292 294 int disable_smap = 0;
293 295
294 296 /*
295 297 * new memory fragmentations are possible in startup() due to BOP_ALLOCs. this
296 298 * depends on number of BOP_ALLOC calls made and requested size, memory size
297 299 * combination and whether boot.bin memory needs to be freed.
298 300 */
299 301 #define POSS_NEW_FRAGMENTS 12
300 302
301 303 /*
302 304 * VM data structures
303 305 */
304 306 long page_hashsz; /* Size of page hash table (power of two) */
305 307 unsigned int page_hashsz_shift; /* log2(page_hashsz) */
306 308 struct page *pp_base; /* Base of initial system page struct array */
307 309 struct page **page_hash; /* Page hash table */
308 310 pad_mutex_t *pse_mutex; /* Locks protecting pp->p_selock */
309 311 size_t pse_table_size; /* Number of mutexes in pse_mutex[] */
310 312 int pse_shift; /* log2(pse_table_size) */
311 313 struct seg ktextseg; /* Segment used for kernel executable image */
312 314 struct seg kvalloc; /* Segment used for "valloc" mapping */
313 315 struct seg kpseg; /* Segment used for pageable kernel virt mem */
314 316 struct seg kmapseg; /* Segment used for generic kernel mappings */
315 317 struct seg kdebugseg; /* Segment used for the kernel debugger */
316 318
317 319 struct seg *segkmap = &kmapseg; /* Kernel generic mapping segment */
318 320 static struct seg *segmap = &kmapseg; /* easier to use name for in here */
319 321
320 322 struct seg *segkp = &kpseg; /* Pageable kernel virtual memory segment */
321 323
322 324 #if defined(__amd64)
|
↓ open down ↓ |
79 lines elided |
↑ open up ↑ |
323 325 struct seg kvseg_core; /* Segment used for the core heap */
324 326 struct seg kpmseg; /* Segment used for physical mapping */
325 327 struct seg *segkpm = &kpmseg; /* 64bit kernel physical mapping segment */
326 328 #else
327 329 struct seg *segkpm = NULL; /* Unused on IA32 */
328 330 #endif
329 331
330 332 caddr_t segkp_base; /* Base address of segkp */
331 333 caddr_t segzio_base; /* Base address of segzio */
332 334 #if defined(__amd64)
333 -pgcnt_t segkpsize = btop(SEGKPDEFSIZE); /* size of segkp segment in pages */
335 +volatile pgcnt_t segkpsize = btop(SEGKPDEFSIZE); /* size of segkp segment in */
336 + /* pages */
334 337 #else
335 -pgcnt_t segkpsize = 0;
338 +volatile pgcnt_t segkpsize = 0;
336 339 #endif
337 340 pgcnt_t segziosize = 0; /* size of zio segment in pages */
338 341
339 342 /*
340 343 * A static DR page_t VA map is reserved that can map the page structures
341 344 * for a domain's entire RA space. The pages that back this space are
342 345 * dynamically allocated and need not be physically contiguous. The DR
343 346 * map size is derived from KPM size.
344 347 * This mechanism isn't used by x86 yet, so just stubs here.
345 348 */
346 349 int ppvm_enable = 0; /* Static virtual map for page structs */
347 350 page_t *ppvm_base = NULL; /* Base of page struct map */
348 351 pgcnt_t ppvm_size = 0; /* Size of page struct map */
349 352
350 353 /*
351 354 * VA range available to the debugger
352 355 */
353 356 const caddr_t kdi_segdebugbase = (const caddr_t)SEGDEBUGBASE;
354 357 const size_t kdi_segdebugsize = SEGDEBUGSIZE;
355 358
356 359 struct memseg *memseg_base;
357 360 struct vnode unused_pages_vp;
358 361
359 362 #define FOURGB 0x100000000LL
360 363
361 364 struct memlist *memlist;
362 365
363 366 caddr_t s_text; /* start of kernel text segment */
364 367 caddr_t e_text; /* end of kernel text segment */
365 368 caddr_t s_data; /* start of kernel data segment */
366 369 caddr_t e_data; /* end of kernel data segment */
367 370 caddr_t modtext; /* start of loadable module text reserved */
368 371 caddr_t e_modtext; /* end of loadable module text reserved */
369 372 caddr_t moddata; /* start of loadable module data reserved */
370 373 caddr_t e_moddata; /* end of loadable module data reserved */
371 374
372 375 struct memlist *phys_install; /* Total installed physical memory */
373 376 struct memlist *phys_avail; /* Total available physical memory */
374 377 struct memlist *bios_rsvd; /* Bios reserved memory */
375 378
376 379 /*
377 380 * kphysm_init returns the number of pages that were processed
378 381 */
379 382 static pgcnt_t kphysm_init(page_t *, pgcnt_t);
380 383
381 384 #define IO_PROP_SIZE 64 /* device property size */
382 385
383 386 /*
384 387 * a couple useful roundup macros
385 388 */
386 389 #define ROUND_UP_PAGE(x) \
387 390 ((uintptr_t)P2ROUNDUP((uintptr_t)(x), (uintptr_t)MMU_PAGESIZE))
388 391 #define ROUND_UP_LPAGE(x) \
389 392 ((uintptr_t)P2ROUNDUP((uintptr_t)(x), mmu.level_size[1]))
390 393 #define ROUND_UP_4MEG(x) \
391 394 ((uintptr_t)P2ROUNDUP((uintptr_t)(x), (uintptr_t)FOUR_MEG))
392 395 #define ROUND_UP_TOPLEVEL(x) \
393 396 ((uintptr_t)P2ROUNDUP((uintptr_t)(x), mmu.level_size[mmu.max_level]))
394 397
395 398 /*
396 399 * 32-bit Kernel's Virtual memory layout.
397 400 * +-----------------------+
398 401 * | |
399 402 * 0xFFC00000 -|-----------------------|- ARGSBASE
400 403 * | debugger |
401 404 * 0xFF800000 -|-----------------------|- SEGDEBUGBASE
402 405 * | Kernel Data |
403 406 * 0xFEC00000 -|-----------------------|
404 407 * | Kernel Text |
405 408 * 0xFE800000 -|-----------------------|- KERNEL_TEXT (0xFB400000 on Xen)
406 409 * |--- GDT ---|- GDT page (GDT_VA)
407 410 * |--- debug info ---|- debug info (DEBUG_INFO_VA)
408 411 * | |
409 412 * | page_t structures |
410 413 * | memsegs, memlists, |
411 414 * | page hash, etc. |
412 415 * --- -|-----------------------|- ekernelheap, valloc_base (floating)
413 416 * | | (segkp is just an arena in the heap)
414 417 * | |
415 418 * | kvseg |
416 419 * | |
417 420 * | |
418 421 * --- -|-----------------------|- kernelheap (floating)
419 422 * | Segkmap |
420 423 * 0xC3002000 -|-----------------------|- segmap_start (floating)
421 424 * | Red Zone |
422 425 * 0xC3000000 -|-----------------------|- kernelbase / userlimit (floating)
423 426 * | | ||
424 427 * | Shared objects | \/
425 428 * | |
426 429 * : :
427 430 * | user data |
428 431 * |-----------------------|
429 432 * | user text |
430 433 * 0x08048000 -|-----------------------|
431 434 * | user stack |
432 435 * : :
433 436 * | invalid |
434 437 * 0x00000000 +-----------------------+
435 438 *
436 439 *
437 440 * 64-bit Kernel's Virtual memory layout. (assuming 64 bit app)
438 441 * +-----------------------+
439 442 * | |
440 443 * 0xFFFFFFFF.FFC00000 |-----------------------|- ARGSBASE
441 444 * | debugger (?) |
442 445 * 0xFFFFFFFF.FF800000 |-----------------------|- SEGDEBUGBASE
443 446 * | unused |
444 447 * +-----------------------+
445 448 * | Kernel Data |
446 449 * 0xFFFFFFFF.FBC00000 |-----------------------|
447 450 * | Kernel Text |
448 451 * 0xFFFFFFFF.FB800000 |-----------------------|- KERNEL_TEXT
449 452 * |--- GDT ---|- GDT page (GDT_VA)
450 453 * |--- debug info ---|- debug info (DEBUG_INFO_VA)
451 454 * | |
452 455 * | Core heap | (used for loadable modules)
453 456 * 0xFFFFFFFF.C0000000 |-----------------------|- core_base / ekernelheap
454 457 * | Kernel |
455 458 * | heap |
456 459 * 0xFFFFFXXX.XXX00000 |-----------------------|- kernelheap (floating)
457 460 * | segmap |
458 461 * 0xFFFFFXXX.XXX00000 |-----------------------|- segmap_start (floating)
459 462 * | device mappings |
460 463 * 0xFFFFFXXX.XXX00000 |-----------------------|- toxic_addr (floating)
461 464 * | segzio |
462 465 * 0xFFFFFXXX.XXX00000 |-----------------------|- segzio_base (floating)
463 466 * | segkp |
464 467 * --- |-----------------------|- segkp_base (floating)
465 468 * | page_t structures | valloc_base + valloc_sz
466 469 * | memsegs, memlists, |
467 470 * | page hash, etc. |
468 471 * 0xFFFFFF00.00000000 |-----------------------|- valloc_base (lower if >256GB)
469 472 * | segkpm |
470 473 * 0xFFFFFE00.00000000 |-----------------------|
471 474 * | Red Zone |
472 475 * 0xFFFFFD80.00000000 |-----------------------|- KERNELBASE (lower if >256GB)
473 476 * | User stack |- User space memory
474 477 * | |
475 478 * | shared objects, etc | (grows downwards)
476 479 * : :
477 480 * | |
478 481 * 0xFFFF8000.00000000 |-----------------------|
479 482 * | |
480 483 * | VA Hole / unused |
481 484 * | |
482 485 * 0x00008000.00000000 |-----------------------|
483 486 * | |
484 487 * | |
485 488 * : :
486 489 * | user heap | (grows upwards)
487 490 * | |
488 491 * | user data |
489 492 * |-----------------------|
490 493 * | user text |
491 494 * 0x00000000.04000000 |-----------------------|
492 495 * | invalid |
493 496 * 0x00000000.00000000 +-----------------------+
494 497 *
495 498 * A 32 bit app on the 64 bit kernel sees the same layout as on the 32 bit
496 499 * kernel, except that userlimit is raised to 0xfe000000
497 500 *
498 501 * Floating values:
499 502 *
500 503 * valloc_base: start of the kernel's memory management/tracking data
501 504 * structures. This region contains page_t structures for
502 505 * physical memory, memsegs, memlists, and the page hash.
503 506 *
504 507 * core_base: start of the kernel's "core" heap area on 64-bit systems.
505 508 * This area is intended to be used for global data as well as for module
506 509 * text/data that does not fit into the nucleus pages. The core heap is
507 510 * restricted to a 2GB range, allowing every address within it to be
508 511 * accessed using rip-relative addressing
509 512 *
510 513 * ekernelheap: end of kernelheap and start of segmap.
511 514 *
512 515 * kernelheap: start of kernel heap. On 32-bit systems, this starts right
513 516 * above a red zone that separates the user's address space from the
514 517 * kernel's. On 64-bit systems, it sits above segkp and segkpm.
515 518 *
516 519 * segmap_start: start of segmap. The length of segmap can be modified
517 520 * through eeprom. The default length is 16MB on 32-bit systems and 64MB
518 521 * on 64-bit systems.
519 522 *
520 523 * kernelbase: On a 32-bit kernel the default value of 0xd4000000 will be
521 524 * decreased by 2X the size required for page_t. This allows the kernel
522 525 * heap to grow in size with physical memory. With sizeof(page_t) == 80
523 526 * bytes, the following shows the values of kernelbase and kernel heap
524 527 * sizes for different memory configurations (assuming default segmap and
525 528 * segkp sizes).
526 529 *
527 530 * mem size for kernelbase kernel heap
528 531 * size page_t's size
529 532 * ---- --------- ---------- -----------
530 533 * 1gb 0x01400000 0xd1800000 684MB
531 534 * 2gb 0x02800000 0xcf000000 704MB
532 535 * 4gb 0x05000000 0xca000000 744MB
533 536 * 6gb 0x07800000 0xc5000000 784MB
534 537 * 8gb 0x0a000000 0xc0000000 824MB
535 538 * 16gb 0x14000000 0xac000000 984MB
536 539 * 32gb 0x28000000 0x84000000 1304MB
537 540 * 64gb 0x50000000 0x34000000 1944MB (*)
538 541 *
539 542 * kernelbase is less than the abi minimum of 0xc0000000 for memory
540 543 * configurations above 8gb.
541 544 *
542 545 * (*) support for memory configurations above 32gb will require manual tuning
543 546 * of kernelbase to balance out the need of user applications.
544 547 */
545 548
546 549 /* real-time-clock initialization parameters */
547 550 extern time_t process_rtc_config_file(void);
548 551
549 552 uintptr_t kernelbase;
550 553 uintptr_t postbootkernelbase; /* not set till boot loader is gone */
551 554 uintptr_t eprom_kernelbase;
552 555 size_t segmapsize;
553 556 uintptr_t segmap_start;
554 557 int segmapfreelists;
555 558 pgcnt_t npages;
556 559 pgcnt_t orig_npages;
557 560 size_t core_size; /* size of "core" heap */
558 561 uintptr_t core_base; /* base address of "core" heap */
559 562
560 563 /*
561 564 * List of bootstrap pages. We mark these as allocated in startup.
562 565 * release_bootstrap() will free them when we're completely done with
563 566 * the bootstrap.
564 567 */
565 568 static page_t *bootpages;
566 569
567 570 /*
568 571 * boot time pages that have a vnode from the ramdisk will keep that forever.
569 572 */
570 573 static page_t *rd_pages;
571 574
572 575 /*
573 576 * Lower 64K
574 577 */
575 578 static page_t *lower_pages = NULL;
576 579 static int lower_pages_count = 0;
577 580
578 581 struct system_hardware system_hardware;
579 582
580 583 /*
581 584 * Enable some debugging messages concerning memory usage...
582 585 */
583 586 static void
584 587 print_memlist(char *title, struct memlist *mp)
585 588 {
586 589 prom_printf("MEMLIST: %s:\n", title);
587 590 while (mp != NULL) {
588 591 prom_printf("\tAddress 0x%" PRIx64 ", size 0x%" PRIx64 "\n",
589 592 mp->ml_address, mp->ml_size);
590 593 mp = mp->ml_next;
591 594 }
592 595 }
593 596
594 597 /*
595 598 * XX64 need a comment here.. are these just default values, surely
596 599 * we read the "cpuid" type information to figure this out.
597 600 */
598 601 int l2cache_sz = 0x80000;
599 602 int l2cache_linesz = 0x40;
600 603 int l2cache_assoc = 1;
601 604
602 605 static size_t textrepl_min_gb = 10;
603 606
604 607 /*
605 608 * on 64 bit we use a predifined VA range for mapping devices in the kernel
606 609 * on 32 bit the mappings are intermixed in the heap, so we use a bit map
607 610 */
608 611 #ifdef __amd64
609 612
610 613 vmem_t *device_arena;
611 614 uintptr_t toxic_addr = (uintptr_t)NULL;
612 615 size_t toxic_size = 1024 * 1024 * 1024; /* Sparc uses 1 gig too */
613 616
614 617 #else /* __i386 */
615 618
616 619 ulong_t *toxic_bit_map; /* one bit for each 4k of VA in heap_arena */
617 620 size_t toxic_bit_map_len = 0; /* in bits */
618 621
619 622 #endif /* __i386 */
620 623
621 624 /*
622 625 * Simple boot time debug facilities
623 626 */
624 627 static char *prm_dbg_str[] = {
625 628 "%s:%d: '%s' is 0x%x\n",
626 629 "%s:%d: '%s' is 0x%llx\n"
627 630 };
628 631
629 632 int prom_debug;
630 633
631 634 #define PRM_DEBUG(q) if (prom_debug) \
632 635 prom_printf(prm_dbg_str[sizeof (q) >> 3], "startup.c", __LINE__, #q, q);
633 636 #define PRM_POINT(q) if (prom_debug) \
634 637 prom_printf("%s:%d: %s\n", "startup.c", __LINE__, q);
635 638
636 639 /*
637 640 * This structure is used to keep track of the intial allocations
638 641 * done in startup_memlist(). The value of NUM_ALLOCATIONS needs to
639 642 * be >= the number of ADD_TO_ALLOCATIONS() executed in the code.
640 643 */
641 644 #define NUM_ALLOCATIONS 8
642 645 int num_allocations = 0;
643 646 struct {
644 647 void **al_ptr;
645 648 size_t al_size;
646 649 } allocations[NUM_ALLOCATIONS];
647 650 size_t valloc_sz = 0;
648 651 uintptr_t valloc_base;
649 652
650 653 #define ADD_TO_ALLOCATIONS(ptr, size) { \
651 654 size = ROUND_UP_PAGE(size); \
652 655 if (num_allocations == NUM_ALLOCATIONS) \
653 656 panic("too many ADD_TO_ALLOCATIONS()"); \
654 657 allocations[num_allocations].al_ptr = (void**)&ptr; \
655 658 allocations[num_allocations].al_size = size; \
656 659 valloc_sz += size; \
657 660 ++num_allocations; \
658 661 }
659 662
660 663 /*
661 664 * Allocate all the initial memory needed by the page allocator.
662 665 */
663 666 static void
664 667 perform_allocations(void)
665 668 {
666 669 caddr_t mem;
667 670 int i;
668 671 int valloc_align;
669 672
670 673 PRM_DEBUG(valloc_base);
671 674 PRM_DEBUG(valloc_sz);
672 675 valloc_align = mmu.level_size[mmu.max_page_level > 0];
673 676 mem = BOP_ALLOC(bootops, (caddr_t)valloc_base, valloc_sz, valloc_align);
674 677 if (mem != (caddr_t)valloc_base)
675 678 panic("BOP_ALLOC() failed");
676 679 bzero(mem, valloc_sz);
677 680 for (i = 0; i < num_allocations; ++i) {
678 681 *allocations[i].al_ptr = (void *)mem;
679 682 mem += allocations[i].al_size;
680 683 }
681 684 }
682 685
683 686 /*
684 687 * Set up and enable SMAP now before we start other CPUs, but after the kernel's
685 688 * VM has been set up so we can use hot_patch_kernel_text().
686 689 *
687 690 * We can only patch 1, 2, or 4 bytes, but not three bytes. So instead, we
688 691 * replace the four byte word at the patch point. See uts/intel/ia32/ml/copy.s
689 692 * for more information on what's going on here.
690 693 */
691 694 static void
692 695 startup_smap(void)
693 696 {
694 697 int i;
695 698 uint32_t inst;
696 699 uint8_t *instp;
697 700 char sym[128];
698 701
699 702 extern int _smap_enable_patch_count;
700 703 extern int _smap_disable_patch_count;
701 704
702 705 if (disable_smap != 0)
703 706 remove_x86_feature(x86_featureset, X86FSET_SMAP);
704 707
705 708 if (is_x86_feature(x86_featureset, X86FSET_SMAP) == B_FALSE)
706 709 return;
707 710
708 711 for (i = 0; i < _smap_enable_patch_count; i++) {
709 712 int sizep;
710 713
711 714 VERIFY3U(i, <, _smap_enable_patch_count);
712 715 VERIFY(snprintf(sym, sizeof (sym), "_smap_enable_patch_%d", i) <
713 716 sizeof (sym));
714 717 instp = (uint8_t *)(void *)kobj_getelfsym(sym, NULL, &sizep);
715 718 VERIFY(instp != 0);
716 719 inst = (instp[3] << 24) | (SMAP_CLAC_INSTR & 0x00ffffff);
717 720 hot_patch_kernel_text((caddr_t)instp, inst, 4);
718 721 }
719 722
720 723 for (i = 0; i < _smap_disable_patch_count; i++) {
721 724 int sizep;
722 725
723 726 VERIFY(snprintf(sym, sizeof (sym), "_smap_disable_patch_%d",
724 727 i) < sizeof (sym));
725 728 instp = (uint8_t *)(void *)kobj_getelfsym(sym, NULL, &sizep);
726 729 VERIFY(instp != 0);
727 730 inst = (instp[3] << 24) | (SMAP_STAC_INSTR & 0x00ffffff);
728 731 hot_patch_kernel_text((caddr_t)instp, inst, 4);
729 732 }
730 733
731 734 hot_patch_kernel_text((caddr_t)smap_enable, SMAP_CLAC_INSTR, 4);
732 735 hot_patch_kernel_text((caddr_t)smap_disable, SMAP_STAC_INSTR, 4);
733 736 setcr4(getcr4() | CR4_SMAP);
734 737 smap_enable();
735 738 }
736 739
737 740 /*
738 741 * Our world looks like this at startup time.
739 742 *
740 743 * In a 32-bit OS, boot loads the kernel text at 0xfe800000 and kernel data
741 744 * at 0xfec00000. On a 64-bit OS, kernel text and data are loaded at
742 745 * 0xffffffff.fe800000 and 0xffffffff.fec00000 respectively. Those
743 746 * addresses are fixed in the binary at link time.
744 747 *
745 748 * On the text page:
746 749 * unix/genunix/krtld/module text loads.
747 750 *
748 751 * On the data page:
749 752 * unix/genunix/krtld/module data loads.
750 753 *
751 754 * Machine-dependent startup code
752 755 */
753 756 void
754 757 startup(void)
755 758 {
756 759 #if !defined(__xpv)
757 760 extern void startup_pci_bios(void);
758 761 #endif
759 762 extern cpuset_t cpu_ready_set;
760 763
761 764 /*
762 765 * Make sure that nobody tries to use sekpm until we have
763 766 * initialized it properly.
764 767 */
765 768 #if defined(__amd64)
766 769 kpm_desired = 1;
767 770 #endif
768 771 kpm_enable = 0;
769 772 CPUSET_ONLY(cpu_ready_set, 0); /* cpu 0 is boot cpu */
770 773
771 774 #if defined(__xpv) /* XXPV fix me! */
772 775 {
773 776 extern int segvn_use_regions;
774 777 segvn_use_regions = 0;
775 778 }
776 779 #endif
777 780 ssp_init();
778 781 progressbar_init();
779 782 startup_init();
780 783 #if defined(__xpv)
781 784 startup_xen_version();
782 785 #endif
783 786 startup_memlist();
784 787 startup_kmem();
785 788 startup_vm();
786 789 #if !defined(__xpv)
787 790 /*
788 791 * Note we need to do this even on fast reboot in order to access
789 792 * the irq routing table (used for pci labels).
790 793 */
791 794 startup_pci_bios();
792 795 startup_smap();
793 796 #endif
794 797 #if defined(__xpv)
795 798 startup_xen_mca();
796 799 #endif
797 800 startup_modules();
798 801
799 802 startup_end();
800 803 }
801 804
802 805 static void
803 806 startup_init()
804 807 {
805 808 PRM_POINT("startup_init() starting...");
806 809
807 810 /*
808 811 * Complete the extraction of cpuid data
809 812 */
810 813 cpuid_pass2(CPU);
811 814
812 815 (void) check_boot_version(BOP_GETVERSION(bootops));
813 816
814 817 /*
815 818 * Check for prom_debug in boot environment
816 819 */
817 820 if (BOP_GETPROPLEN(bootops, "prom_debug") >= 0) {
818 821 ++prom_debug;
819 822 PRM_POINT("prom_debug found in boot enviroment");
820 823 }
821 824
822 825 /*
823 826 * Collect node, cpu and memory configuration information.
824 827 */
825 828 get_system_configuration();
826 829
827 830 /*
828 831 * Halt if this is an unsupported processor.
829 832 */
830 833 if (x86_type == X86_TYPE_486 || x86_type == X86_TYPE_CYRIX_486) {
831 834 printf("\n486 processor (\"%s\") detected.\n",
832 835 CPU->cpu_brandstr);
833 836 halt("This processor is not supported by this release "
834 837 "of Solaris.");
835 838 }
836 839
837 840 PRM_POINT("startup_init() done");
838 841 }
839 842
840 843 /*
841 844 * Callback for copy_memlist_filter() to filter nucleus, kadb/kmdb, (ie.
842 845 * everything mapped above KERNEL_TEXT) pages from phys_avail. Note it
843 846 * also filters out physical page zero. There is some reliance on the
844 847 * boot loader allocating only a few contiguous physical memory chunks.
845 848 */
846 849 static void
847 850 avail_filter(uint64_t *addr, uint64_t *size)
848 851 {
849 852 uintptr_t va;
850 853 uintptr_t next_va;
851 854 pfn_t pfn;
852 855 uint64_t pfn_addr;
853 856 uint64_t pfn_eaddr;
854 857 uint_t prot;
855 858 size_t len;
856 859 uint_t change;
857 860
858 861 if (prom_debug)
859 862 prom_printf("\tFilter: in: a=%" PRIx64 ", s=%" PRIx64 "\n",
860 863 *addr, *size);
861 864
862 865 /*
863 866 * page zero is required for BIOS.. never make it available
864 867 */
865 868 if (*addr == 0) {
866 869 *addr += MMU_PAGESIZE;
867 870 *size -= MMU_PAGESIZE;
868 871 }
869 872
870 873 /*
871 874 * First we trim from the front of the range. Since kbm_probe()
872 875 * walks ranges in virtual order, but addr/size are physical, we need
873 876 * to the list until no changes are seen. This deals with the case
874 877 * where page "p" is mapped at v, page "p + PAGESIZE" is mapped at w
875 878 * but w < v.
876 879 */
877 880 do {
878 881 change = 0;
879 882 for (va = KERNEL_TEXT;
880 883 *size > 0 && kbm_probe(&va, &len, &pfn, &prot) != 0;
881 884 va = next_va) {
882 885
883 886 next_va = va + len;
884 887 pfn_addr = pfn_to_pa(pfn);
885 888 pfn_eaddr = pfn_addr + len;
886 889
887 890 if (pfn_addr <= *addr && pfn_eaddr > *addr) {
888 891 change = 1;
889 892 while (*size > 0 && len > 0) {
890 893 *addr += MMU_PAGESIZE;
891 894 *size -= MMU_PAGESIZE;
892 895 len -= MMU_PAGESIZE;
893 896 }
894 897 }
895 898 }
896 899 if (change && prom_debug)
897 900 prom_printf("\t\ttrim: a=%" PRIx64 ", s=%" PRIx64 "\n",
898 901 *addr, *size);
899 902 } while (change);
900 903
901 904 /*
902 905 * Trim pages from the end of the range.
903 906 */
904 907 for (va = KERNEL_TEXT;
905 908 *size > 0 && kbm_probe(&va, &len, &pfn, &prot) != 0;
906 909 va = next_va) {
907 910
908 911 next_va = va + len;
909 912 pfn_addr = pfn_to_pa(pfn);
910 913
911 914 if (pfn_addr >= *addr && pfn_addr < *addr + *size)
912 915 *size = pfn_addr - *addr;
913 916 }
914 917
915 918 if (prom_debug)
916 919 prom_printf("\tFilter out: a=%" PRIx64 ", s=%" PRIx64 "\n",
917 920 *addr, *size);
918 921 }
919 922
920 923 static void
921 924 kpm_init()
922 925 {
923 926 struct segkpm_crargs b;
924 927
925 928 /*
926 929 * These variables were all designed for sfmmu in which segkpm is
927 930 * mapped using a single pagesize - either 8KB or 4MB. On x86, we
928 931 * might use 2+ page sizes on a single machine, so none of these
929 932 * variables have a single correct value. They are set up as if we
930 933 * always use a 4KB pagesize, which should do no harm. In the long
931 934 * run, we should get rid of KPM's assumption that only a single
932 935 * pagesize is used.
933 936 */
934 937 kpm_pgshft = MMU_PAGESHIFT;
935 938 kpm_pgsz = MMU_PAGESIZE;
936 939 kpm_pgoff = MMU_PAGEOFFSET;
937 940 kpmp2pshft = 0;
938 941 kpmpnpgs = 1;
939 942 ASSERT(((uintptr_t)kpm_vbase & (kpm_pgsz - 1)) == 0);
940 943
941 944 PRM_POINT("about to create segkpm");
942 945 rw_enter(&kas.a_lock, RW_WRITER);
943 946
944 947 if (seg_attach(&kas, kpm_vbase, kpm_size, segkpm) < 0)
945 948 panic("cannot attach segkpm");
946 949
947 950 b.prot = PROT_READ | PROT_WRITE;
948 951 b.nvcolors = 1;
949 952
950 953 if (segkpm_create(segkpm, (caddr_t)&b) != 0)
951 954 panic("segkpm_create segkpm");
952 955
953 956 rw_exit(&kas.a_lock);
954 957 }
955 958
956 959 /*
957 960 * The debug info page provides enough information to allow external
958 961 * inspectors (e.g. when running under a hypervisor) to bootstrap
959 962 * themselves into allowing full-blown kernel debugging.
960 963 */
961 964 static void
962 965 init_debug_info(void)
963 966 {
964 967 caddr_t mem;
965 968 debug_info_t *di;
966 969
967 970 #ifndef __lint
968 971 ASSERT(sizeof (debug_info_t) < MMU_PAGESIZE);
969 972 #endif
970 973
971 974 mem = BOP_ALLOC(bootops, (caddr_t)DEBUG_INFO_VA, MMU_PAGESIZE,
972 975 MMU_PAGESIZE);
973 976
974 977 if (mem != (caddr_t)DEBUG_INFO_VA)
975 978 panic("BOP_ALLOC() failed");
976 979 bzero(mem, MMU_PAGESIZE);
977 980
978 981 di = (debug_info_t *)mem;
979 982
980 983 di->di_magic = DEBUG_INFO_MAGIC;
981 984 di->di_version = DEBUG_INFO_VERSION;
982 985 di->di_modules = (uintptr_t)&modules;
983 986 di->di_s_text = (uintptr_t)s_text;
984 987 di->di_e_text = (uintptr_t)e_text;
985 988 di->di_s_data = (uintptr_t)s_data;
986 989 di->di_e_data = (uintptr_t)e_data;
987 990 di->di_hat_htable_off = offsetof(hat_t, hat_htable);
988 991 di->di_ht_pfn_off = offsetof(htable_t, ht_pfn);
989 992 }
990 993
991 994 /*
992 995 * Build the memlists and other kernel essential memory system data structures.
993 996 * This is everything at valloc_base.
994 997 */
995 998 static void
996 999 startup_memlist(void)
997 1000 {
998 1001 size_t memlist_sz;
999 1002 size_t memseg_sz;
1000 1003 size_t pagehash_sz;
1001 1004 size_t pp_sz;
1002 1005 uintptr_t va;
1003 1006 size_t len;
1004 1007 uint_t prot;
1005 1008 pfn_t pfn;
1006 1009 int memblocks;
1007 1010 pfn_t rsvd_high_pfn;
1008 1011 pgcnt_t rsvd_pgcnt;
1009 1012 size_t rsvdmemlist_sz;
1010 1013 int rsvdmemblocks;
1011 1014 caddr_t pagecolor_mem;
1012 1015 size_t pagecolor_memsz;
1013 1016 caddr_t page_ctrs_mem;
1014 1017 size_t page_ctrs_size;
1015 1018 size_t pse_table_alloc_size;
1016 1019 struct memlist *current;
1017 1020 extern void startup_build_mem_nodes(struct memlist *);
1018 1021
1019 1022 /* XX64 fix these - they should be in include files */
1020 1023 extern size_t page_coloring_init(uint_t, int, int);
1021 1024 extern void page_coloring_setup(caddr_t);
1022 1025
1023 1026 PRM_POINT("startup_memlist() starting...");
1024 1027
1025 1028 /*
1026 1029 * Use leftover large page nucleus text/data space for loadable modules.
1027 1030 * Use at most MODTEXT/MODDATA.
1028 1031 */
1029 1032 len = kbm_nucleus_size;
1030 1033 ASSERT(len > MMU_PAGESIZE);
1031 1034
1032 1035 moddata = (caddr_t)ROUND_UP_PAGE(e_data);
1033 1036 e_moddata = (caddr_t)P2ROUNDUP((uintptr_t)e_data, (uintptr_t)len);
1034 1037 if (e_moddata - moddata > MODDATA)
1035 1038 e_moddata = moddata + MODDATA;
1036 1039
1037 1040 modtext = (caddr_t)ROUND_UP_PAGE(e_text);
1038 1041 e_modtext = (caddr_t)P2ROUNDUP((uintptr_t)e_text, (uintptr_t)len);
1039 1042 if (e_modtext - modtext > MODTEXT)
1040 1043 e_modtext = modtext + MODTEXT;
1041 1044
1042 1045 econtig = e_moddata;
1043 1046
1044 1047 PRM_DEBUG(modtext);
1045 1048 PRM_DEBUG(e_modtext);
1046 1049 PRM_DEBUG(moddata);
1047 1050 PRM_DEBUG(e_moddata);
1048 1051 PRM_DEBUG(econtig);
1049 1052
1050 1053 /*
1051 1054 * Examine the boot loader physical memory map to find out:
1052 1055 * - total memory in system - physinstalled
1053 1056 * - the max physical address - physmax
1054 1057 * - the number of discontiguous segments of memory.
1055 1058 */
1056 1059 if (prom_debug)
1057 1060 print_memlist("boot physinstalled",
1058 1061 bootops->boot_mem->physinstalled);
1059 1062 installed_top_size_ex(bootops->boot_mem->physinstalled, &physmax,
1060 1063 &physinstalled, &memblocks);
1061 1064 PRM_DEBUG(physmax);
1062 1065 PRM_DEBUG(physinstalled);
1063 1066 PRM_DEBUG(memblocks);
1064 1067
1065 1068 /*
1066 1069 * Compute maximum physical address for memory DR operations.
1067 1070 * Memory DR operations are unsupported on xpv or 32bit OSes.
1068 1071 */
1069 1072 #ifdef __amd64
1070 1073 if (plat_dr_support_memory()) {
1071 1074 if (plat_dr_physmax == 0) {
1072 1075 uint_t pabits = UINT_MAX;
1073 1076
1074 1077 cpuid_get_addrsize(CPU, &pabits, NULL);
1075 1078 plat_dr_physmax = btop(1ULL << pabits);
1076 1079 }
1077 1080 if (plat_dr_physmax > PHYSMEM_MAX64)
1078 1081 plat_dr_physmax = PHYSMEM_MAX64;
1079 1082 } else
1080 1083 #endif
1081 1084 plat_dr_physmax = 0;
1082 1085
1083 1086 /*
1084 1087 * Examine the bios reserved memory to find out:
1085 1088 * - the number of discontiguous segments of memory.
1086 1089 */
1087 1090 if (prom_debug)
1088 1091 print_memlist("boot reserved mem",
1089 1092 bootops->boot_mem->rsvdmem);
1090 1093 installed_top_size_ex(bootops->boot_mem->rsvdmem, &rsvd_high_pfn,
1091 1094 &rsvd_pgcnt, &rsvdmemblocks);
1092 1095 PRM_DEBUG(rsvd_high_pfn);
1093 1096 PRM_DEBUG(rsvd_pgcnt);
1094 1097 PRM_DEBUG(rsvdmemblocks);
1095 1098
1096 1099 /*
1097 1100 * Initialize hat's mmu parameters.
1098 1101 * Check for enforce-prot-exec in boot environment. It's used to
1099 1102 * enable/disable support for the page table entry NX bit.
1100 1103 * The default is to enforce PROT_EXEC on processors that support NX.
1101 1104 * Boot seems to round up the "len", but 8 seems to be big enough.
1102 1105 */
1103 1106 mmu_init();
1104 1107
1105 1108 #ifdef __i386
1106 1109 /*
1107 1110 * physmax is lowered if there is more memory than can be
1108 1111 * physically addressed in 32 bit (PAE/non-PAE) modes.
1109 1112 */
1110 1113 if (mmu.pae_hat) {
1111 1114 if (PFN_ABOVE64G(physmax)) {
1112 1115 physinstalled -= (physmax - (PFN_64G - 1));
1113 1116 physmax = PFN_64G - 1;
1114 1117 }
1115 1118 } else {
1116 1119 if (PFN_ABOVE4G(physmax)) {
1117 1120 physinstalled -= (physmax - (PFN_4G - 1));
1118 1121 physmax = PFN_4G - 1;
1119 1122 }
1120 1123 }
1121 1124 #endif
1122 1125
1123 1126 startup_build_mem_nodes(bootops->boot_mem->physinstalled);
1124 1127
1125 1128 if (BOP_GETPROPLEN(bootops, "enforce-prot-exec") >= 0) {
1126 1129 int len = BOP_GETPROPLEN(bootops, "enforce-prot-exec");
1127 1130 char value[8];
1128 1131
1129 1132 if (len < 8)
1130 1133 (void) BOP_GETPROP(bootops, "enforce-prot-exec", value);
1131 1134 else
1132 1135 (void) strcpy(value, "");
1133 1136 if (strcmp(value, "off") == 0)
1134 1137 mmu.pt_nx = 0;
1135 1138 }
1136 1139 PRM_DEBUG(mmu.pt_nx);
1137 1140
1138 1141 /*
1139 1142 * We will need page_t's for every page in the system, except for
1140 1143 * memory mapped at or above above the start of the kernel text segment.
1141 1144 *
1142 1145 * pages above e_modtext are attributed to kernel debugger (obp_pages)
1143 1146 */
1144 1147 npages = physinstalled - 1; /* avail_filter() skips page 0, so "- 1" */
1145 1148 obp_pages = 0;
1146 1149 va = KERNEL_TEXT;
1147 1150 while (kbm_probe(&va, &len, &pfn, &prot) != 0) {
1148 1151 npages -= len >> MMU_PAGESHIFT;
1149 1152 if (va >= (uintptr_t)e_moddata)
1150 1153 obp_pages += len >> MMU_PAGESHIFT;
1151 1154 va += len;
1152 1155 }
1153 1156 PRM_DEBUG(npages);
1154 1157 PRM_DEBUG(obp_pages);
1155 1158
1156 1159 /*
1157 1160 * If physmem is patched to be non-zero, use it instead of the computed
1158 1161 * value unless it is larger than the actual amount of memory on hand.
1159 1162 */
1160 1163 if (physmem == 0 || physmem > npages) {
1161 1164 physmem = npages;
1162 1165 } else if (physmem < npages) {
1163 1166 orig_npages = npages;
1164 1167 npages = physmem;
1165 1168 }
1166 1169 PRM_DEBUG(physmem);
1167 1170
1168 1171 /*
1169 1172 * We now compute the sizes of all the initial allocations for
1170 1173 * structures the kernel needs in order do kmem_alloc(). These
1171 1174 * include:
1172 1175 * memsegs
1173 1176 * memlists
1174 1177 * page hash table
1175 1178 * page_t's
1176 1179 * page coloring data structs
1177 1180 */
1178 1181 memseg_sz = sizeof (struct memseg) * (memblocks + POSS_NEW_FRAGMENTS);
1179 1182 ADD_TO_ALLOCATIONS(memseg_base, memseg_sz);
1180 1183 PRM_DEBUG(memseg_sz);
1181 1184
1182 1185 /*
1183 1186 * Reserve space for memlists. There's no real good way to know exactly
1184 1187 * how much room we'll need, but this should be a good upper bound.
1185 1188 */
1186 1189 memlist_sz = ROUND_UP_PAGE(2 * sizeof (struct memlist) *
1187 1190 (memblocks + POSS_NEW_FRAGMENTS));
1188 1191 ADD_TO_ALLOCATIONS(memlist, memlist_sz);
1189 1192 PRM_DEBUG(memlist_sz);
1190 1193
1191 1194 /*
1192 1195 * Reserve space for bios reserved memlists.
1193 1196 */
1194 1197 rsvdmemlist_sz = ROUND_UP_PAGE(2 * sizeof (struct memlist) *
1195 1198 (rsvdmemblocks + POSS_NEW_FRAGMENTS));
1196 1199 ADD_TO_ALLOCATIONS(bios_rsvd, rsvdmemlist_sz);
1197 1200 PRM_DEBUG(rsvdmemlist_sz);
1198 1201
1199 1202 /* LINTED */
1200 1203 ASSERT(P2SAMEHIGHBIT((1 << PP_SHIFT), sizeof (struct page)));
1201 1204 /*
1202 1205 * The page structure hash table size is a power of 2
1203 1206 * such that the average hash chain length is PAGE_HASHAVELEN.
1204 1207 */
1205 1208 page_hashsz = npages / PAGE_HASHAVELEN;
1206 1209 page_hashsz_shift = highbit(page_hashsz);
1207 1210 page_hashsz = 1 << page_hashsz_shift;
1208 1211 pagehash_sz = sizeof (struct page *) * page_hashsz;
1209 1212 ADD_TO_ALLOCATIONS(page_hash, pagehash_sz);
1210 1213 PRM_DEBUG(pagehash_sz);
1211 1214
1212 1215 /*
1213 1216 * Set aside room for the page structures themselves.
1214 1217 */
1215 1218 PRM_DEBUG(npages);
1216 1219 pp_sz = sizeof (struct page) * npages;
1217 1220 ADD_TO_ALLOCATIONS(pp_base, pp_sz);
1218 1221 PRM_DEBUG(pp_sz);
1219 1222
1220 1223 /*
1221 1224 * determine l2 cache info and memory size for page coloring
1222 1225 */
1223 1226 (void) getl2cacheinfo(CPU,
1224 1227 &l2cache_sz, &l2cache_linesz, &l2cache_assoc);
1225 1228 pagecolor_memsz =
1226 1229 page_coloring_init(l2cache_sz, l2cache_linesz, l2cache_assoc);
1227 1230 ADD_TO_ALLOCATIONS(pagecolor_mem, pagecolor_memsz);
1228 1231 PRM_DEBUG(pagecolor_memsz);
1229 1232
1230 1233 page_ctrs_size = page_ctrs_sz();
1231 1234 ADD_TO_ALLOCATIONS(page_ctrs_mem, page_ctrs_size);
1232 1235 PRM_DEBUG(page_ctrs_size);
1233 1236
1234 1237 /*
1235 1238 * Allocate the array that protects pp->p_selock.
1236 1239 */
1237 1240 pse_shift = size_pse_array(physmem, max_ncpus);
1238 1241 pse_table_size = 1 << pse_shift;
1239 1242 pse_table_alloc_size = pse_table_size * sizeof (pad_mutex_t);
1240 1243 ADD_TO_ALLOCATIONS(pse_mutex, pse_table_alloc_size);
1241 1244
1242 1245 #if defined(__amd64)
1243 1246 valloc_sz = ROUND_UP_LPAGE(valloc_sz);
1244 1247 valloc_base = VALLOC_BASE;
1245 1248
1246 1249 /*
1247 1250 * The default values of VALLOC_BASE and SEGKPM_BASE should work
1248 1251 * for values of physmax up to 256GB (1/4 TB). They need adjusting when
1249 1252 * memory is at addresses above 256GB. When adjusted, segkpm_base must
1250 1253 * be aligned on KERNEL_REDZONE_SIZE boundary (span of top level pte).
1251 1254 *
1252 1255 * In the general case (>256GB), we use (4 * physmem) for the
1253 1256 * kernel's virtual addresses, which is divided approximately
1254 1257 * as follows:
1255 1258 * - 1 * physmem for segkpm
1256 1259 * - 1.5 * physmem for segzio
1257 1260 * - 1.5 * physmem for heap
1258 1261 * Total: 4.0 * physmem
1259 1262 *
1260 1263 * Note that the segzio and heap sizes are more than physmem so that
1261 1264 * VA fragmentation does not prevent either of them from being
1262 1265 * able to use nearly all of physmem. The value of 1.5x is determined
1263 1266 * experimentally and may need to change if the workload changes.
1264 1267 */
1265 1268 if (physmax + 1 > mmu_btop(TERABYTE / 4) ||
1266 1269 plat_dr_physmax > mmu_btop(TERABYTE / 4)) {
1267 1270 uint64_t kpm_resv_amount = mmu_ptob(physmax + 1);
1268 1271
1269 1272 if (kpm_resv_amount < mmu_ptob(plat_dr_physmax)) {
1270 1273 kpm_resv_amount = mmu_ptob(plat_dr_physmax);
1271 1274 }
1272 1275
1273 1276 /*
1274 1277 * This is what actually controls the KVA : UVA split.
1275 1278 * The kernel uses high VA, and this is lowering the
1276 1279 * boundary, thus increasing the amount of VA for the kernel.
1277 1280 * This gives the kernel 4 * (amount of physical memory) VA.
1278 1281 *
1279 1282 * The maximum VA is UINT64_MAX and we are using
1280 1283 * 64-bit 2's complement math, so e.g. if you have 512GB
1281 1284 * of memory, segkpm_base = -(4 * 512GB) == -2TB ==
1282 1285 * UINT64_MAX - 2TB (approximately). So the kernel's
1283 1286 * VA is [UINT64_MAX-2TB to UINT64_MAX].
1284 1287 */
1285 1288 segkpm_base = -(P2ROUNDUP((4 * kpm_resv_amount),
1286 1289 KERNEL_REDZONE_SIZE));
1287 1290
1288 1291 /* make sure we leave some space for user apps above hole */
1289 1292 segkpm_base = MAX(segkpm_base, AMD64_VA_HOLE_END + TERABYTE);
1290 1293 if (segkpm_base > SEGKPM_BASE)
1291 1294 segkpm_base = SEGKPM_BASE;
1292 1295 PRM_DEBUG(segkpm_base);
1293 1296
1294 1297 valloc_base = segkpm_base + P2ROUNDUP(kpm_resv_amount, ONE_GIG);
1295 1298 if (valloc_base < segkpm_base)
1296 1299 panic("not enough kernel VA to support memory size");
1297 1300 PRM_DEBUG(valloc_base);
1298 1301 }
1299 1302 #else /* __i386 */
1300 1303 valloc_base = (uintptr_t)(MISC_VA_BASE - valloc_sz);
1301 1304 valloc_base = P2ALIGN(valloc_base, mmu.level_size[1]);
1302 1305 PRM_DEBUG(valloc_base);
1303 1306 #endif /* __i386 */
1304 1307
1305 1308 /*
1306 1309 * do all the initial allocations
1307 1310 */
1308 1311 perform_allocations();
1309 1312
1310 1313 /*
1311 1314 * Build phys_install and phys_avail in kernel memspace.
1312 1315 * - phys_install should be all memory in the system.
1313 1316 * - phys_avail is phys_install minus any memory mapped before this
1314 1317 * point above KERNEL_TEXT.
1315 1318 */
1316 1319 current = phys_install = memlist;
1317 1320 copy_memlist_filter(bootops->boot_mem->physinstalled, ¤t, NULL);
1318 1321 if ((caddr_t)current > (caddr_t)memlist + memlist_sz)
1319 1322 panic("physinstalled was too big!");
1320 1323 if (prom_debug)
1321 1324 print_memlist("phys_install", phys_install);
1322 1325
1323 1326 phys_avail = current;
1324 1327 PRM_POINT("Building phys_avail:\n");
1325 1328 copy_memlist_filter(bootops->boot_mem->physinstalled, ¤t,
1326 1329 avail_filter);
1327 1330 if ((caddr_t)current > (caddr_t)memlist + memlist_sz)
1328 1331 panic("physavail was too big!");
1329 1332 if (prom_debug)
1330 1333 print_memlist("phys_avail", phys_avail);
1331 1334 #ifndef __xpv
1332 1335 /*
1333 1336 * Free unused memlist items, which may be used by memory DR driver
1334 1337 * at runtime.
1335 1338 */
1336 1339 if ((caddr_t)current < (caddr_t)memlist + memlist_sz) {
1337 1340 memlist_free_block((caddr_t)current,
1338 1341 (caddr_t)memlist + memlist_sz - (caddr_t)current);
1339 1342 }
1340 1343 #endif
1341 1344
1342 1345 /*
1343 1346 * Build bios reserved memspace
1344 1347 */
1345 1348 current = bios_rsvd;
1346 1349 copy_memlist_filter(bootops->boot_mem->rsvdmem, ¤t, NULL);
1347 1350 if ((caddr_t)current > (caddr_t)bios_rsvd + rsvdmemlist_sz)
1348 1351 panic("bios_rsvd was too big!");
1349 1352 if (prom_debug)
1350 1353 print_memlist("bios_rsvd", bios_rsvd);
1351 1354 #ifndef __xpv
1352 1355 /*
1353 1356 * Free unused memlist items, which may be used by memory DR driver
1354 1357 * at runtime.
1355 1358 */
1356 1359 if ((caddr_t)current < (caddr_t)bios_rsvd + rsvdmemlist_sz) {
1357 1360 memlist_free_block((caddr_t)current,
1358 1361 (caddr_t)bios_rsvd + rsvdmemlist_sz - (caddr_t)current);
1359 1362 }
1360 1363 #endif
1361 1364
1362 1365 /*
1363 1366 * setup page coloring
1364 1367 */
1365 1368 page_coloring_setup(pagecolor_mem);
1366 1369 page_lock_init(); /* currently a no-op */
1367 1370
1368 1371 /*
1369 1372 * free page list counters
1370 1373 */
1371 1374 (void) page_ctrs_alloc(page_ctrs_mem);
1372 1375
1373 1376 /*
1374 1377 * Size the pcf array based on the number of cpus in the box at
1375 1378 * boot time.
1376 1379 */
1377 1380
1378 1381 pcf_init();
1379 1382
1380 1383 /*
1381 1384 * Initialize the page structures from the memory lists.
1382 1385 */
1383 1386 availrmem_initial = availrmem = freemem = 0;
1384 1387 PRM_POINT("Calling kphysm_init()...");
1385 1388 npages = kphysm_init(pp_base, npages);
1386 1389 PRM_POINT("kphysm_init() done");
1387 1390 PRM_DEBUG(npages);
1388 1391
1389 1392 init_debug_info();
1390 1393
1391 1394 /*
1392 1395 * Now that page_t's have been initialized, remove all the
1393 1396 * initial allocation pages from the kernel free page lists.
1394 1397 */
1395 1398 boot_mapin((caddr_t)valloc_base, valloc_sz);
1396 1399 boot_mapin((caddr_t)MISC_VA_BASE, MISC_VA_SIZE);
1397 1400 PRM_POINT("startup_memlist() done");
1398 1401
1399 1402 PRM_DEBUG(valloc_sz);
1400 1403
1401 1404 #if defined(__amd64)
1402 1405 if ((availrmem >> (30 - MMU_PAGESHIFT)) >=
1403 1406 textrepl_min_gb && l2cache_sz <= 2 << 20) {
1404 1407 extern size_t textrepl_size_thresh;
1405 1408 textrepl_size_thresh = (16 << 20) - 1;
1406 1409 }
1407 1410 #endif
1408 1411 }
1409 1412
1410 1413 /*
1411 1414 * Layout the kernel's part of address space and initialize kmem allocator.
1412 1415 */
1413 1416 static void
1414 1417 startup_kmem(void)
1415 1418 {
1416 1419 extern void page_set_colorequiv_arr(void);
1417 1420
1418 1421 PRM_POINT("startup_kmem() starting...");
1419 1422
1420 1423 #if defined(__amd64)
1421 1424 if (eprom_kernelbase && eprom_kernelbase != KERNELBASE)
1422 1425 cmn_err(CE_NOTE, "!kernelbase cannot be changed on 64-bit "
1423 1426 "systems.");
1424 1427 kernelbase = segkpm_base - KERNEL_REDZONE_SIZE;
1425 1428 core_base = (uintptr_t)COREHEAP_BASE;
1426 1429 core_size = (size_t)MISC_VA_BASE - COREHEAP_BASE;
1427 1430 #else /* __i386 */
1428 1431 /*
1429 1432 * We configure kernelbase based on:
1430 1433 *
1431 1434 * 1. user specified kernelbase via eeprom command. Value cannot exceed
1432 1435 * KERNELBASE_MAX. we large page align eprom_kernelbase
1433 1436 *
1434 1437 * 2. Default to KERNELBASE and adjust to 2X less the size for page_t.
1435 1438 * On large memory systems we must lower kernelbase to allow
1436 1439 * enough room for page_t's for all of memory.
1437 1440 *
1438 1441 * The value set here, might be changed a little later.
1439 1442 */
1440 1443 if (eprom_kernelbase) {
1441 1444 kernelbase = eprom_kernelbase & mmu.level_mask[1];
1442 1445 if (kernelbase > KERNELBASE_MAX)
1443 1446 kernelbase = KERNELBASE_MAX;
1444 1447 } else {
1445 1448 kernelbase = (uintptr_t)KERNELBASE;
1446 1449 kernelbase -= ROUND_UP_4MEG(2 * valloc_sz);
1447 1450 }
1448 1451 ASSERT((kernelbase & mmu.level_offset[1]) == 0);
1449 1452 core_base = valloc_base;
1450 1453 core_size = 0;
1451 1454 #endif /* __i386 */
1452 1455
1453 1456 PRM_DEBUG(core_base);
1454 1457 PRM_DEBUG(core_size);
1455 1458 PRM_DEBUG(kernelbase);
1456 1459
1457 1460 #if defined(__i386)
1458 1461 segkp_fromheap = 1;
1459 1462 #endif /* __i386 */
1460 1463
1461 1464 ekernelheap = (char *)core_base;
1462 1465 PRM_DEBUG(ekernelheap);
1463 1466
1464 1467 /*
1465 1468 * Now that we know the real value of kernelbase,
1466 1469 * update variables that were initialized with a value of
1467 1470 * KERNELBASE (in common/conf/param.c).
1468 1471 *
1469 1472 * XXX The problem with this sort of hackery is that the
1470 1473 * compiler just may feel like putting the const declarations
1471 1474 * (in param.c) into the .text section. Perhaps they should
1472 1475 * just be declared as variables there?
1473 1476 */
1474 1477
1475 1478 *(uintptr_t *)&_kernelbase = kernelbase;
1476 1479 *(uintptr_t *)&_userlimit = kernelbase;
1477 1480 #if defined(__amd64)
1478 1481 *(uintptr_t *)&_userlimit -= KERNELBASE - USERLIMIT;
1479 1482 #else
1480 1483 *(uintptr_t *)&_userlimit32 = _userlimit;
1481 1484 #endif
1482 1485 PRM_DEBUG(_kernelbase);
1483 1486 PRM_DEBUG(_userlimit);
1484 1487 PRM_DEBUG(_userlimit32);
1485 1488
1486 1489 layout_kernel_va();
1487 1490
1488 1491 #if defined(__i386)
1489 1492 /*
1490 1493 * If segmap is too large we can push the bottom of the kernel heap
1491 1494 * higher than the base. Or worse, it could exceed the top of the
1492 1495 * VA space entirely, causing it to wrap around.
1493 1496 */
1494 1497 if (kernelheap >= ekernelheap || (uintptr_t)kernelheap < kernelbase)
1495 1498 panic("too little address space available for kernelheap,"
1496 1499 " use eeprom for lower kernelbase or smaller segmapsize");
1497 1500 #endif /* __i386 */
1498 1501
1499 1502 /*
1500 1503 * Initialize the kernel heap. Note 3rd argument must be > 1st.
1501 1504 */
1502 1505 kernelheap_init(kernelheap, ekernelheap,
1503 1506 kernelheap + MMU_PAGESIZE,
1504 1507 (void *)core_base, (void *)(core_base + core_size));
1505 1508
1506 1509 #if defined(__xpv)
1507 1510 /*
1508 1511 * Link pending events struct into cpu struct
1509 1512 */
1510 1513 CPU->cpu_m.mcpu_evt_pend = &cpu0_evt_data;
1511 1514 #endif
1512 1515 /*
1513 1516 * Initialize kernel memory allocator.
1514 1517 */
1515 1518 kmem_init();
1516 1519
1517 1520 /*
1518 1521 * Factor in colorequiv to check additional 'equivalent' bins
1519 1522 */
1520 1523 page_set_colorequiv_arr();
1521 1524
1522 1525 /*
1523 1526 * print this out early so that we know what's going on
1524 1527 */
1525 1528 print_x86_featureset(x86_featureset);
1526 1529
1527 1530 /*
1528 1531 * Initialize bp_mapin().
1529 1532 */
1530 1533 bp_init(MMU_PAGESIZE, HAT_STORECACHING_OK);
1531 1534
1532 1535 /*
1533 1536 * orig_npages is non-zero if physmem has been configured for less
1534 1537 * than the available memory.
1535 1538 */
1536 1539 if (orig_npages) {
1537 1540 cmn_err(CE_WARN, "!%slimiting physmem to 0x%lx of 0x%lx pages",
1538 1541 (npages == PHYSMEM ? "Due to virtual address space " : ""),
1539 1542 npages, orig_npages);
1540 1543 }
1541 1544 #if defined(__i386)
1542 1545 if (eprom_kernelbase && (eprom_kernelbase != kernelbase))
1543 1546 cmn_err(CE_WARN, "kernelbase value, User specified 0x%lx, "
1544 1547 "System using 0x%lx",
1545 1548 (uintptr_t)eprom_kernelbase, (uintptr_t)kernelbase);
1546 1549 #endif
1547 1550
1548 1551 #ifdef KERNELBASE_ABI_MIN
1549 1552 if (kernelbase < (uintptr_t)KERNELBASE_ABI_MIN) {
1550 1553 cmn_err(CE_NOTE, "!kernelbase set to 0x%lx, system is not "
1551 1554 "i386 ABI compliant.", (uintptr_t)kernelbase);
1552 1555 }
1553 1556 #endif
1554 1557
1555 1558 #ifndef __xpv
1556 1559 if (plat_dr_support_memory()) {
1557 1560 mem_config_init();
1558 1561 }
1559 1562 #else /* __xpv */
1560 1563 /*
1561 1564 * Some of the xen start information has to be relocated up
1562 1565 * into the kernel's permanent address space.
1563 1566 */
1564 1567 PRM_POINT("calling xen_relocate_start_info()");
1565 1568 xen_relocate_start_info();
1566 1569 PRM_POINT("xen_relocate_start_info() done");
1567 1570
1568 1571 /*
1569 1572 * (Update the vcpu pointer in our cpu structure to point into
1570 1573 * the relocated shared info.)
1571 1574 */
1572 1575 CPU->cpu_m.mcpu_vcpu_info =
1573 1576 &HYPERVISOR_shared_info->vcpu_info[CPU->cpu_id];
1574 1577 #endif /* __xpv */
1575 1578
1576 1579 PRM_POINT("startup_kmem() done");
1577 1580 }
1578 1581
1579 1582 #ifndef __xpv
1580 1583 /*
1581 1584 * If we have detected that we are running in an HVM environment, we need
1582 1585 * to prepend the PV driver directory to the module search path.
1583 1586 */
1584 1587 #define HVM_MOD_DIR "/platform/i86hvm/kernel"
1585 1588 static void
1586 1589 update_default_path()
1587 1590 {
1588 1591 char *current, *newpath;
1589 1592 int newlen;
1590 1593
1591 1594 /*
1592 1595 * We are about to resync with krtld. krtld will reset its
1593 1596 * internal module search path iff Solaris has set default_path.
1594 1597 * We want to be sure we're prepending this new directory to the
1595 1598 * right search path.
1596 1599 */
1597 1600 current = (default_path == NULL) ? kobj_module_path : default_path;
1598 1601
1599 1602 newlen = strlen(HVM_MOD_DIR) + strlen(current) + 2;
1600 1603 newpath = kmem_alloc(newlen, KM_SLEEP);
1601 1604 (void) strcpy(newpath, HVM_MOD_DIR);
1602 1605 (void) strcat(newpath, " ");
1603 1606 (void) strcat(newpath, current);
1604 1607
1605 1608 default_path = newpath;
1606 1609 }
1607 1610 #endif
1608 1611
1609 1612 static void
1610 1613 startup_modules(void)
1611 1614 {
1612 1615 int cnt;
1613 1616 extern void prom_setup(void);
1614 1617 int32_t v, h;
1615 1618 char d[11];
1616 1619 char *cp;
1617 1620 cmi_hdl_t hdl;
1618 1621
1619 1622 PRM_POINT("startup_modules() starting...");
1620 1623
1621 1624 #ifndef __xpv
1622 1625 /*
1623 1626 * Initialize ten-micro second timer so that drivers will
1624 1627 * not get short changed in their init phase. This was
1625 1628 * not getting called until clkinit which, on fast cpu's
1626 1629 * caused the drv_usecwait to be way too short.
1627 1630 */
1628 1631 microfind();
1629 1632
1630 1633 if ((get_hwenv() & HW_XEN_HVM) != 0)
1631 1634 update_default_path();
1632 1635 #endif
1633 1636
1634 1637 /*
1635 1638 * Read the GMT lag from /etc/rtc_config.
1636 1639 */
1637 1640 sgmtl(process_rtc_config_file());
1638 1641
1639 1642 /*
1640 1643 * Calculate default settings of system parameters based upon
1641 1644 * maxusers, yet allow to be overridden via the /etc/system file.
1642 1645 */
1643 1646 param_calc(0);
1644 1647
1645 1648 mod_setup();
1646 1649
1647 1650 /*
1648 1651 * Initialize system parameters.
1649 1652 */
1650 1653 param_init();
1651 1654
1652 1655 /*
1653 1656 * Initialize the default brands
1654 1657 */
1655 1658 brand_init();
1656 1659
1657 1660 /*
1658 1661 * maxmem is the amount of physical memory we're playing with.
1659 1662 */
1660 1663 maxmem = physmem;
1661 1664
1662 1665 /*
1663 1666 * Initialize segment management stuff.
1664 1667 */
1665 1668 seg_init();
1666 1669
1667 1670 if (modload("fs", "specfs") == -1)
1668 1671 halt("Can't load specfs");
1669 1672
1670 1673 if (modload("fs", "devfs") == -1)
1671 1674 halt("Can't load devfs");
1672 1675
1673 1676 if (modload("fs", "dev") == -1)
1674 1677 halt("Can't load dev");
1675 1678
1676 1679 if (modload("fs", "procfs") == -1)
1677 1680 halt("Can't load procfs");
1678 1681
1679 1682 (void) modloadonly("sys", "lbl_edition");
1680 1683
1681 1684 dispinit();
1682 1685
1683 1686 /* Read cluster configuration data. */
1684 1687 clconf_init();
1685 1688
1686 1689 #if defined(__xpv)
1687 1690 (void) ec_init();
1688 1691 gnttab_init();
1689 1692 (void) xs_early_init();
1690 1693 #endif /* __xpv */
1691 1694
1692 1695 /*
1693 1696 * Create a kernel device tree. First, create rootnex and
1694 1697 * then invoke bus specific code to probe devices.
1695 1698 */
1696 1699 setup_ddi();
1697 1700
1698 1701 #ifdef __xpv
1699 1702 if (DOMAIN_IS_INITDOMAIN(xen_info))
1700 1703 #endif
1701 1704 {
1702 1705 id_t smid;
1703 1706 smbios_system_t smsys;
1704 1707 smbios_info_t sminfo;
1705 1708 char *mfg;
1706 1709 /*
1707 1710 * Load the System Management BIOS into the global ksmbios
1708 1711 * handle, if an SMBIOS is present on this system.
1709 1712 * Also set "si-hw-provider" property, if not already set.
1710 1713 */
1711 1714 ksmbios = smbios_open(NULL, SMB_VERSION, ksmbios_flags, NULL);
1712 1715 if (ksmbios != NULL &&
1713 1716 ((smid = smbios_info_system(ksmbios, &smsys)) != SMB_ERR) &&
1714 1717 (smbios_info_common(ksmbios, smid, &sminfo)) != SMB_ERR) {
1715 1718 mfg = (char *)sminfo.smbi_manufacturer;
1716 1719 if (BOP_GETPROPLEN(bootops, "si-hw-provider") < 0) {
1717 1720 extern char hw_provider[];
1718 1721 int i;
1719 1722 for (i = 0; i < SYS_NMLN; i++) {
1720 1723 if (isprint(mfg[i]))
1721 1724 hw_provider[i] = mfg[i];
1722 1725 else {
1723 1726 hw_provider[i] = '\0';
1724 1727 break;
1725 1728 }
1726 1729 }
1727 1730 hw_provider[SYS_NMLN - 1] = '\0';
1728 1731 }
1729 1732 }
1730 1733 }
1731 1734
1732 1735
1733 1736 /*
1734 1737 * Originally clconf_init() apparently needed the hostid. But
1735 1738 * this no longer appears to be true - it uses its own nodeid.
1736 1739 * By placing the hostid logic here, we are able to make use of
1737 1740 * the SMBIOS UUID.
1738 1741 */
1739 1742 if ((h = set_soft_hostid()) == HW_INVALID_HOSTID) {
1740 1743 cmn_err(CE_WARN, "Unable to set hostid");
1741 1744 } else {
1742 1745 for (v = h, cnt = 0; cnt < 10; cnt++) {
1743 1746 d[cnt] = (char)(v % 10);
1744 1747 v /= 10;
1745 1748 if (v == 0)
1746 1749 break;
1747 1750 }
1748 1751 for (cp = hw_serial; cnt >= 0; cnt--)
1749 1752 *cp++ = d[cnt] + '0';
1750 1753 *cp = 0;
1751 1754 }
1752 1755
1753 1756 /*
1754 1757 * Set up the CPU module subsystem for the boot cpu in the native
1755 1758 * case, and all physical cpu resource in the xpv dom0 case.
1756 1759 * Modifies the device tree, so this must be done after
1757 1760 * setup_ddi().
1758 1761 */
1759 1762 #ifdef __xpv
1760 1763 /*
1761 1764 * If paravirtualized and on dom0 then we initialize all physical
1762 1765 * cpu handles now; if paravirtualized on a domU then do not
1763 1766 * initialize.
1764 1767 */
1765 1768 if (DOMAIN_IS_INITDOMAIN(xen_info)) {
1766 1769 xen_mc_lcpu_cookie_t cpi;
1767 1770
1768 1771 for (cpi = xen_physcpu_next(NULL); cpi != NULL;
1769 1772 cpi = xen_physcpu_next(cpi)) {
1770 1773 if ((hdl = cmi_init(CMI_HDL_SOLARIS_xVM_MCA,
1771 1774 xen_physcpu_chipid(cpi), xen_physcpu_coreid(cpi),
1772 1775 xen_physcpu_strandid(cpi))) != NULL &&
1773 1776 is_x86_feature(x86_featureset, X86FSET_MCA))
1774 1777 cmi_mca_init(hdl);
1775 1778 }
1776 1779 }
1777 1780 #else
1778 1781 /*
1779 1782 * Initialize a handle for the boot cpu - others will initialize
1780 1783 * as they startup.
1781 1784 */
1782 1785 if ((hdl = cmi_init(CMI_HDL_NATIVE, cmi_ntv_hwchipid(CPU),
1783 1786 cmi_ntv_hwcoreid(CPU), cmi_ntv_hwstrandid(CPU))) != NULL) {
1784 1787 if (is_x86_feature(x86_featureset, X86FSET_MCA))
1785 1788 cmi_mca_init(hdl);
1786 1789 CPU->cpu_m.mcpu_cmi_hdl = hdl;
1787 1790 }
1788 1791 #endif /* __xpv */
1789 1792
1790 1793 /*
1791 1794 * Fake a prom tree such that /dev/openprom continues to work
1792 1795 */
1793 1796 PRM_POINT("startup_modules: calling prom_setup...");
1794 1797 prom_setup();
1795 1798 PRM_POINT("startup_modules: done");
1796 1799
1797 1800 /*
1798 1801 * Load all platform specific modules
1799 1802 */
1800 1803 PRM_POINT("startup_modules: calling psm_modload...");
1801 1804 psm_modload();
1802 1805
1803 1806 PRM_POINT("startup_modules() done");
1804 1807 }
1805 1808
1806 1809 /*
1807 1810 * claim a "setaside" boot page for use in the kernel
1808 1811 */
1809 1812 page_t *
1810 1813 boot_claim_page(pfn_t pfn)
1811 1814 {
1812 1815 page_t *pp;
1813 1816
1814 1817 pp = page_numtopp_nolock(pfn);
1815 1818 ASSERT(pp != NULL);
1816 1819
1817 1820 if (PP_ISBOOTPAGES(pp)) {
1818 1821 if (pp->p_next != NULL)
1819 1822 pp->p_next->p_prev = pp->p_prev;
1820 1823 if (pp->p_prev == NULL)
1821 1824 bootpages = pp->p_next;
1822 1825 else
1823 1826 pp->p_prev->p_next = pp->p_next;
1824 1827 } else {
1825 1828 /*
1826 1829 * htable_attach() expects a base pagesize page
1827 1830 */
1828 1831 if (pp->p_szc != 0)
1829 1832 page_boot_demote(pp);
1830 1833 pp = page_numtopp(pfn, SE_EXCL);
1831 1834 }
1832 1835 return (pp);
1833 1836 }
1834 1837
1835 1838 /*
1836 1839 * Walk through the pagetables looking for pages mapped in by boot. If the
1837 1840 * setaside flag is set the pages are expected to be returned to the
1838 1841 * kernel later in boot, so we add them to the bootpages list.
1839 1842 */
1840 1843 static void
1841 1844 protect_boot_range(uintptr_t low, uintptr_t high, int setaside)
1842 1845 {
1843 1846 uintptr_t va = low;
1844 1847 size_t len;
1845 1848 uint_t prot;
1846 1849 pfn_t pfn;
1847 1850 page_t *pp;
1848 1851 pgcnt_t boot_protect_cnt = 0;
1849 1852
1850 1853 while (kbm_probe(&va, &len, &pfn, &prot) != 0 && va < high) {
1851 1854 if (va + len >= high)
1852 1855 panic("0x%lx byte mapping at 0x%p exceeds boot's "
1853 1856 "legal range.", len, (void *)va);
1854 1857
1855 1858 while (len > 0) {
1856 1859 pp = page_numtopp_alloc(pfn);
1857 1860 if (pp != NULL) {
1858 1861 if (setaside == 0)
1859 1862 panic("Unexpected mapping by boot. "
1860 1863 "addr=%p pfn=%lx\n",
1861 1864 (void *)va, pfn);
1862 1865
1863 1866 pp->p_next = bootpages;
1864 1867 pp->p_prev = NULL;
1865 1868 PP_SETBOOTPAGES(pp);
1866 1869 if (bootpages != NULL) {
1867 1870 bootpages->p_prev = pp;
1868 1871 }
1869 1872 bootpages = pp;
1870 1873 ++boot_protect_cnt;
1871 1874 }
1872 1875
1873 1876 ++pfn;
1874 1877 len -= MMU_PAGESIZE;
1875 1878 va += MMU_PAGESIZE;
1876 1879 }
1877 1880 }
1878 1881 PRM_DEBUG(boot_protect_cnt);
1879 1882 }
1880 1883
1881 1884 /*
1882 1885 *
1883 1886 */
1884 1887 static void
1885 1888 layout_kernel_va(void)
1886 1889 {
1887 1890 PRM_POINT("layout_kernel_va() starting...");
1888 1891 /*
1889 1892 * Establish the final size of the kernel's heap, size of segmap,
1890 1893 * segkp, etc.
1891 1894 */
1892 1895
1893 1896 #if defined(__amd64)
1894 1897
1895 1898 kpm_vbase = (caddr_t)segkpm_base;
1896 1899 if (physmax + 1 < plat_dr_physmax) {
1897 1900 kpm_size = ROUND_UP_LPAGE(mmu_ptob(plat_dr_physmax));
1898 1901 } else {
1899 1902 kpm_size = ROUND_UP_LPAGE(mmu_ptob(physmax + 1));
1900 1903 }
1901 1904 if ((uintptr_t)kpm_vbase + kpm_size > (uintptr_t)valloc_base)
1902 1905 panic("not enough room for kpm!");
1903 1906 PRM_DEBUG(kpm_size);
1904 1907 PRM_DEBUG(kpm_vbase);
1905 1908
1906 1909 /*
1907 1910 * By default we create a seg_kp in 64 bit kernels, it's a little
1908 1911 * faster to access than embedding it in the heap.
1909 1912 */
1910 1913 segkp_base = (caddr_t)valloc_base + valloc_sz;
1911 1914 if (!segkp_fromheap) {
1912 1915 size_t sz = mmu_ptob(segkpsize);
1913 1916
1914 1917 /*
1915 1918 * determine size of segkp
1916 1919 */
1917 1920 if (sz < SEGKPMINSIZE || sz > SEGKPMAXSIZE) {
1918 1921 sz = SEGKPDEFSIZE;
1919 1922 cmn_err(CE_WARN, "!Illegal value for segkpsize. "
1920 1923 "segkpsize has been reset to %ld pages",
1921 1924 mmu_btop(sz));
1922 1925 }
1923 1926 sz = MIN(sz, MAX(SEGKPMINSIZE, mmu_ptob(physmem)));
1924 1927
1925 1928 segkpsize = mmu_btop(ROUND_UP_LPAGE(sz));
1926 1929 }
1927 1930 PRM_DEBUG(segkp_base);
1928 1931 PRM_DEBUG(segkpsize);
1929 1932
1930 1933 /*
1931 1934 * segzio is used for ZFS cached data. It uses a distinct VA
1932 1935 * segment (from kernel heap) so that we can easily tell not to
1933 1936 * include it in kernel crash dumps on 64 bit kernels. The trick is
1934 1937 * to give it lots of VA, but not constrain the kernel heap.
1935 1938 * We can use 1.5x physmem for segzio, leaving approximately
1936 1939 * another 1.5x physmem for heap. See also the comment in
1937 1940 * startup_memlist().
1938 1941 */
1939 1942 segzio_base = segkp_base + mmu_ptob(segkpsize);
1940 1943 if (segzio_fromheap) {
1941 1944 segziosize = 0;
1942 1945 } else {
1943 1946 size_t physmem_size = mmu_ptob(physmem);
1944 1947 size_t size = (segziosize == 0) ?
1945 1948 physmem_size * 3 / 2 : mmu_ptob(segziosize);
1946 1949
1947 1950 if (size < SEGZIOMINSIZE)
1948 1951 size = SEGZIOMINSIZE;
1949 1952 segziosize = mmu_btop(ROUND_UP_LPAGE(size));
1950 1953 }
1951 1954 PRM_DEBUG(segziosize);
1952 1955 PRM_DEBUG(segzio_base);
1953 1956
1954 1957 /*
1955 1958 * Put the range of VA for device mappings next, kmdb knows to not
1956 1959 * grep in this range of addresses.
1957 1960 */
1958 1961 toxic_addr =
1959 1962 ROUND_UP_LPAGE((uintptr_t)segzio_base + mmu_ptob(segziosize));
1960 1963 PRM_DEBUG(toxic_addr);
1961 1964 segmap_start = ROUND_UP_LPAGE(toxic_addr + toxic_size);
1962 1965 #else /* __i386 */
1963 1966 segmap_start = ROUND_UP_LPAGE(kernelbase);
1964 1967 #endif /* __i386 */
1965 1968 PRM_DEBUG(segmap_start);
1966 1969
1967 1970 /*
1968 1971 * Users can change segmapsize through eeprom. If the variable
1969 1972 * is tuned through eeprom, there is no upper bound on the
1970 1973 * size of segmap.
1971 1974 */
1972 1975 segmapsize = MAX(ROUND_UP_LPAGE(segmapsize), SEGMAPDEFAULT);
1973 1976
1974 1977 #if defined(__i386)
1975 1978 /*
1976 1979 * 32-bit systems don't have segkpm or segkp, so segmap appears at
1977 1980 * the bottom of the kernel's address range. Set aside space for a
1978 1981 * small red zone just below the start of segmap.
1979 1982 */
1980 1983 segmap_start += KERNEL_REDZONE_SIZE;
1981 1984 segmapsize -= KERNEL_REDZONE_SIZE;
1982 1985 #endif
1983 1986
1984 1987 PRM_DEBUG(segmap_start);
1985 1988 PRM_DEBUG(segmapsize);
1986 1989 kernelheap = (caddr_t)ROUND_UP_LPAGE(segmap_start + segmapsize);
1987 1990 PRM_DEBUG(kernelheap);
1988 1991 PRM_POINT("layout_kernel_va() done...");
1989 1992 }
1990 1993
1991 1994 /*
1992 1995 * Finish initializing the VM system, now that we are no longer
1993 1996 * relying on the boot time memory allocators.
1994 1997 */
1995 1998 static void
1996 1999 startup_vm(void)
1997 2000 {
1998 2001 struct segmap_crargs a;
1999 2002
2000 2003 extern int use_brk_lpg, use_stk_lpg;
2001 2004
2002 2005 PRM_POINT("startup_vm() starting...");
2003 2006
2004 2007 /*
2005 2008 * Initialize the hat layer.
2006 2009 */
2007 2010 hat_init();
2008 2011
2009 2012 /*
2010 2013 * Do final allocations of HAT data structures that need to
2011 2014 * be allocated before quiescing the boot loader.
2012 2015 */
2013 2016 PRM_POINT("Calling hat_kern_alloc()...");
2014 2017 hat_kern_alloc((caddr_t)segmap_start, segmapsize, ekernelheap);
2015 2018 PRM_POINT("hat_kern_alloc() done");
2016 2019
2017 2020 #ifndef __xpv
2018 2021 /*
2019 2022 * Setup Page Attribute Table
2020 2023 */
2021 2024 pat_sync();
2022 2025 #endif
2023 2026
2024 2027 /*
2025 2028 * The next two loops are done in distinct steps in order
2026 2029 * to be sure that any page that is doubly mapped (both above
2027 2030 * KERNEL_TEXT and below kernelbase) is dealt with correctly.
2028 2031 * Note this may never happen, but it might someday.
2029 2032 */
2030 2033 bootpages = NULL;
2031 2034 PRM_POINT("Protecting boot pages");
2032 2035
2033 2036 /*
2034 2037 * Protect any pages mapped above KERNEL_TEXT that somehow have
2035 2038 * page_t's. This can only happen if something weird allocated
2036 2039 * in this range (like kadb/kmdb).
|
↓ open down ↓ |
1691 lines elided |
↑ open up ↑ |
2037 2040 */
2038 2041 protect_boot_range(KERNEL_TEXT, (uintptr_t)-1, 0);
2039 2042
2040 2043 /*
2041 2044 * Before we can take over memory allocation/mapping from the boot
2042 2045 * loader we must remove from our free page lists any boot allocated
2043 2046 * pages that stay mapped until release_bootstrap().
2044 2047 */
2045 2048 protect_boot_range(0, kernelbase, 1);
2046 2049
2047 -
2048 2050 /*
2049 2051 * Switch to running on regular HAT (not boot_mmu)
2050 2052 */
2051 2053 PRM_POINT("Calling hat_kern_setup()...");
2052 2054 hat_kern_setup();
2053 2055
2054 2056 /*
2055 2057 * It is no longer safe to call BOP_ALLOC(), so make sure we don't.
2056 2058 */
2057 2059 bop_no_more_mem();
2058 2060
2059 2061 PRM_POINT("hat_kern_setup() done");
2060 2062
2061 2063 hat_cpu_online(CPU);
2062 2064
2063 2065 /*
2064 2066 * Initialize VM system
2065 2067 */
2066 2068 PRM_POINT("Calling kvm_init()...");
2067 2069 kvm_init();
2068 2070 PRM_POINT("kvm_init() done");
2069 2071
2070 2072 /*
2071 2073 * Tell kmdb that the VM system is now working
2072 2074 */
2073 2075 if (boothowto & RB_DEBUG)
2074 2076 kdi_dvec_vmready();
2075 2077
2076 2078 #if defined(__xpv)
2077 2079 /*
2078 2080 * Populate the I/O pool on domain 0
2079 2081 */
2080 2082 if (DOMAIN_IS_INITDOMAIN(xen_info)) {
2081 2083 extern long populate_io_pool(void);
2082 2084 long init_io_pool_cnt;
2083 2085
2084 2086 PRM_POINT("Populating reserve I/O page pool");
2085 2087 init_io_pool_cnt = populate_io_pool();
2086 2088 PRM_DEBUG(init_io_pool_cnt);
2087 2089 }
2088 2090 #endif
2089 2091 /*
2090 2092 * Mangle the brand string etc.
2091 2093 */
2092 2094 cpuid_pass3(CPU);
2093 2095
2094 2096 #if defined(__amd64)
2095 2097
2096 2098 /*
2097 2099 * Create the device arena for toxic (to dtrace/kmdb) mappings.
2098 2100 */
2099 2101 device_arena = vmem_create("device", (void *)toxic_addr,
2100 2102 toxic_size, MMU_PAGESIZE, NULL, NULL, NULL, 0, VM_SLEEP);
2101 2103
2102 2104 #else /* __i386 */
2103 2105
2104 2106 /*
2105 2107 * allocate the bit map that tracks toxic pages
2106 2108 */
2107 2109 toxic_bit_map_len = btop((ulong_t)(valloc_base - kernelbase));
2108 2110 PRM_DEBUG(toxic_bit_map_len);
2109 2111 toxic_bit_map =
2110 2112 kmem_zalloc(BT_SIZEOFMAP(toxic_bit_map_len), KM_NOSLEEP);
2111 2113 ASSERT(toxic_bit_map != NULL);
2112 2114 PRM_DEBUG(toxic_bit_map);
2113 2115
2114 2116 #endif /* __i386 */
2115 2117
2116 2118
2117 2119 /*
2118 2120 * Now that we've got more VA, as well as the ability to allocate from
2119 2121 * it, tell the debugger.
2120 2122 */
2121 2123 if (boothowto & RB_DEBUG)
2122 2124 kdi_dvec_memavail();
2123 2125
2124 2126 /*
2125 2127 * The following code installs a special page fault handler (#pf)
2126 2128 * to work around a pentium bug.
2127 2129 */
2128 2130 #if !defined(__amd64) && !defined(__xpv)
2129 2131 if (x86_type == X86_TYPE_P5) {
2130 2132 desctbr_t idtr;
2131 2133 gate_desc_t *newidt;
2132 2134
2133 2135 if ((newidt = kmem_zalloc(MMU_PAGESIZE, KM_NOSLEEP)) == NULL)
2134 2136 panic("failed to install pentium_pftrap");
2135 2137
2136 2138 bcopy(idt0, newidt, NIDT * sizeof (*idt0));
2137 2139 set_gatesegd(&newidt[T_PGFLT], &pentium_pftrap,
2138 2140 KCS_SEL, SDT_SYSIGT, TRP_KPL, 0);
2139 2141
2140 2142 (void) as_setprot(&kas, (caddr_t)newidt, MMU_PAGESIZE,
2141 2143 PROT_READ | PROT_EXEC);
2142 2144
2143 2145 CPU->cpu_idt = newidt;
2144 2146 idtr.dtr_base = (uintptr_t)CPU->cpu_idt;
2145 2147 idtr.dtr_limit = (NIDT * sizeof (*idt0)) - 1;
2146 2148 wr_idtr(&idtr);
2147 2149 }
2148 2150 #endif /* !__amd64 */
2149 2151
2150 2152 #if !defined(__xpv)
2151 2153 /*
2152 2154 * Map page pfn=0 for drivers, such as kd, that need to pick up
2153 2155 * parameters left there by controllers/BIOS.
2154 2156 */
2155 2157 PRM_POINT("setup up p0_va");
2156 2158 p0_va = i86devmap(0, 1, PROT_READ);
2157 2159 PRM_DEBUG(p0_va);
2158 2160 #endif
2159 2161
2160 2162 cmn_err(CE_CONT, "?mem = %luK (0x%lx)\n",
2161 2163 physinstalled << (MMU_PAGESHIFT - 10), ptob(physinstalled));
2162 2164
2163 2165 /*
2164 2166 * disable automatic large pages for small memory systems or
2165 2167 * when the disable flag is set.
2166 2168 *
2167 2169 * Do not yet consider page sizes larger than 2m/4m.
2168 2170 */
2169 2171 if (!auto_lpg_disable && mmu.max_page_level > 0) {
2170 2172 max_uheap_lpsize = LEVEL_SIZE(1);
2171 2173 max_ustack_lpsize = LEVEL_SIZE(1);
2172 2174 max_privmap_lpsize = LEVEL_SIZE(1);
2173 2175 max_uidata_lpsize = LEVEL_SIZE(1);
2174 2176 max_utext_lpsize = LEVEL_SIZE(1);
2175 2177 max_shm_lpsize = LEVEL_SIZE(1);
2176 2178 }
2177 2179 if (physmem < privm_lpg_min_physmem || mmu.max_page_level == 0 ||
2178 2180 auto_lpg_disable) {
2179 2181 use_brk_lpg = 0;
2180 2182 use_stk_lpg = 0;
2181 2183 }
2182 2184 mcntl0_lpsize = LEVEL_SIZE(mmu.umax_page_level);
2183 2185
2184 2186 PRM_POINT("Calling hat_init_finish()...");
2185 2187 hat_init_finish();
2186 2188 PRM_POINT("hat_init_finish() done");
2187 2189
2188 2190 /*
2189 2191 * Initialize the segkp segment type.
2190 2192 */
2191 2193 rw_enter(&kas.a_lock, RW_WRITER);
2192 2194 PRM_POINT("Attaching segkp");
2193 2195 if (segkp_fromheap) {
2194 2196 segkp->s_as = &kas;
2195 2197 } else if (seg_attach(&kas, (caddr_t)segkp_base, mmu_ptob(segkpsize),
2196 2198 segkp) < 0) {
2197 2199 panic("startup: cannot attach segkp");
2198 2200 /*NOTREACHED*/
2199 2201 }
2200 2202 PRM_POINT("Doing segkp_create()");
2201 2203 if (segkp_create(segkp) != 0) {
2202 2204 panic("startup: segkp_create failed");
2203 2205 /*NOTREACHED*/
2204 2206 }
2205 2207 PRM_DEBUG(segkp);
2206 2208 rw_exit(&kas.a_lock);
2207 2209
2208 2210 /*
2209 2211 * kpm segment
2210 2212 */
2211 2213 segmap_kpm = 0;
2212 2214 if (kpm_desired) {
2213 2215 kpm_init();
2214 2216 kpm_enable = 1;
2215 2217 }
2216 2218
2217 2219 /*
2218 2220 * Now create segmap segment.
2219 2221 */
2220 2222 rw_enter(&kas.a_lock, RW_WRITER);
2221 2223 if (seg_attach(&kas, (caddr_t)segmap_start, segmapsize, segmap) < 0) {
2222 2224 panic("cannot attach segmap");
2223 2225 /*NOTREACHED*/
2224 2226 }
2225 2227 PRM_DEBUG(segmap);
2226 2228
2227 2229 a.prot = PROT_READ | PROT_WRITE;
2228 2230 a.shmsize = 0;
2229 2231 a.nfreelist = segmapfreelists;
2230 2232
2231 2233 if (segmap_create(segmap, (caddr_t)&a) != 0)
2232 2234 panic("segmap_create segmap");
2233 2235 rw_exit(&kas.a_lock);
2234 2236
2235 2237 setup_vaddr_for_ppcopy(CPU);
2236 2238
2237 2239 segdev_init();
2238 2240 #if defined(__xpv)
2239 2241 if (DOMAIN_IS_INITDOMAIN(xen_info))
2240 2242 #endif
2241 2243 pmem_init();
2242 2244
2243 2245 PRM_POINT("startup_vm() done");
2244 2246 }
2245 2247
2246 2248 /*
2247 2249 * Load a tod module for the non-standard tod part found on this system.
2248 2250 */
2249 2251 static void
2250 2252 load_tod_module(char *todmod)
2251 2253 {
2252 2254 if (modload("tod", todmod) == -1)
2253 2255 halt("Can't load TOD module");
2254 2256 }
2255 2257
2256 2258 static void
2257 2259 startup_end(void)
2258 2260 {
2259 2261 int i;
2260 2262 extern void setx86isalist(void);
2261 2263 extern void cpu_event_init(void);
2262 2264
2263 2265 PRM_POINT("startup_end() starting...");
2264 2266
2265 2267 /*
2266 2268 * Perform tasks that get done after most of the VM
2267 2269 * initialization has been done but before the clock
2268 2270 * and other devices get started.
2269 2271 */
2270 2272 kern_setup1();
2271 2273
2272 2274 /*
2273 2275 * Perform CPC initialization for this CPU.
2274 2276 */
2275 2277 kcpc_hw_init(CPU);
2276 2278
2277 2279 /*
2278 2280 * Initialize cpu event framework.
2279 2281 */
2280 2282 cpu_event_init();
2281 2283
2282 2284 #if defined(OPTERON_WORKAROUND_6323525)
2283 2285 if (opteron_workaround_6323525)
2284 2286 patch_workaround_6323525();
2285 2287 #endif
2286 2288 /*
2287 2289 * If needed, load TOD module now so that ddi_get_time(9F) etc. work
2288 2290 * (For now, "needed" is defined as set tod_module_name in /etc/system)
2289 2291 */
2290 2292 if (tod_module_name != NULL) {
2291 2293 PRM_POINT("load_tod_module()");
2292 2294 load_tod_module(tod_module_name);
2293 2295 }
2294 2296
2295 2297 #if defined(__xpv)
2296 2298 /*
2297 2299 * Forceload interposing TOD module for the hypervisor.
2298 2300 */
2299 2301 PRM_POINT("load_tod_module()");
2300 2302 load_tod_module("xpvtod");
2301 2303 #endif
2302 2304
2303 2305 /*
2304 2306 * Configure the system.
2305 2307 */
2306 2308 PRM_POINT("Calling configure()...");
2307 2309 configure(); /* set up devices */
2308 2310 PRM_POINT("configure() done");
2309 2311
2310 2312 /*
2311 2313 * We can now setup for XSAVE because fpu_probe is done in configure().
2312 2314 */
2313 2315 if (fp_save_mech == FP_XSAVE) {
2314 2316 xsave_setup_msr(CPU);
2315 2317 }
2316 2318
2317 2319 /*
2318 2320 * Set the isa_list string to the defined instruction sets we
2319 2321 * support.
2320 2322 */
2321 2323 setx86isalist();
2322 2324 cpu_intr_alloc(CPU, NINTR_THREADS);
2323 2325 psm_install();
2324 2326
2325 2327 /*
2326 2328 * We're done with bootops. We don't unmap the bootstrap yet because
2327 2329 * we're still using bootsvcs.
2328 2330 */
2329 2331 PRM_POINT("NULLing out bootops");
2330 2332 *bootopsp = (struct bootops *)NULL;
2331 2333 bootops = (struct bootops *)NULL;
2332 2334
2333 2335 #if defined(__xpv)
2334 2336 ec_init_debug_irq();
2335 2337 xs_domu_init();
2336 2338 #endif
2337 2339
2338 2340 #if defined(__amd64) && !defined(__xpv)
2339 2341 /*
2340 2342 * Intel IOMMU has been setup/initialized in ddi_impl.c
2341 2343 * Start it up now.
2342 2344 */
2343 2345 immu_startup();
2344 2346 #endif
2345 2347
2346 2348 PRM_POINT("Enabling interrupts");
2347 2349 (*picinitf)();
2348 2350 sti();
2349 2351 #if defined(__xpv)
2350 2352 ASSERT(CPU->cpu_m.mcpu_vcpu_info->evtchn_upcall_mask == 0);
2351 2353 xen_late_startup();
2352 2354 #endif
2353 2355
2354 2356 (void) add_avsoftintr((void *)&softlevel1_hdl, 1, softlevel1,
2355 2357 "softlevel1", NULL, NULL); /* XXX to be moved later */
2356 2358
2357 2359 /*
2358 2360 * Register software interrupt handlers for ddi_periodic_add(9F).
2359 2361 * Software interrupts up to the level 10 are supported.
2360 2362 */
2361 2363 for (i = DDI_IPL_1; i <= DDI_IPL_10; i++) {
2362 2364 (void) add_avsoftintr((void *)&softlevel_hdl[i-1], i,
2363 2365 (avfunc)ddi_periodic_softintr, "ddi_periodic",
2364 2366 (caddr_t)(uintptr_t)i, NULL);
2365 2367 }
2366 2368
2367 2369 #if !defined(__xpv)
2368 2370 if (modload("drv", "amd_iommu") < 0) {
2369 2371 PRM_POINT("No AMD IOMMU present\n");
2370 2372 } else if (ddi_hold_installed_driver(ddi_name_to_major(
2371 2373 "amd_iommu")) == NULL) {
2372 2374 prom_printf("ERROR: failed to attach AMD IOMMU\n");
2373 2375 }
2374 2376 #endif
2375 2377 post_startup_cpu_fixups();
2376 2378
2377 2379 PRM_POINT("startup_end() done");
2378 2380 }
2379 2381
2380 2382 /*
2381 2383 * Don't remove the following 2 variables. They are necessary
2382 2384 * for reading the hostid from the legacy file (/kernel/misc/sysinit).
2383 2385 */
2384 2386 char *_hs1107 = hw_serial;
2385 2387 ulong_t _bdhs34;
2386 2388
2387 2389 void
2388 2390 post_startup(void)
2389 2391 {
2390 2392 extern void cpupm_init(cpu_t *);
2391 2393 extern void cpu_event_init_cpu(cpu_t *);
2392 2394
2393 2395 /*
2394 2396 * Set the system wide, processor-specific flags to be passed
2395 2397 * to userland via the aux vector for performance hints and
2396 2398 * instruction set extensions.
2397 2399 */
2398 2400 bind_hwcap();
2399 2401
2400 2402 #ifdef __xpv
2401 2403 if (DOMAIN_IS_INITDOMAIN(xen_info))
2402 2404 #endif
2403 2405 {
2404 2406 #if defined(__xpv)
2405 2407 xpv_panic_init();
2406 2408 #else
2407 2409 /*
2408 2410 * Startup the memory scrubber.
2409 2411 * XXPV This should be running somewhere ..
2410 2412 */
2411 2413 if ((get_hwenv() & HW_VIRTUAL) == 0)
2412 2414 memscrub_init();
2413 2415 #endif
2414 2416 }
2415 2417
2416 2418 /*
2417 2419 * Complete CPU module initialization
2418 2420 */
2419 2421 cmi_post_startup();
2420 2422
2421 2423 /*
2422 2424 * Perform forceloading tasks for /etc/system.
2423 2425 */
2424 2426 (void) mod_sysctl(SYS_FORCELOAD, NULL);
2425 2427
2426 2428 /*
2427 2429 * ON4.0: Force /proc module in until clock interrupt handle fixed
2428 2430 * ON4.0: This must be fixed or restated in /etc/systems.
2429 2431 */
2430 2432 (void) modload("fs", "procfs");
2431 2433
2432 2434 (void) i_ddi_attach_hw_nodes("pit_beep");
2433 2435
2434 2436 #if defined(__i386)
2435 2437 /*
2436 2438 * Check for required functional Floating Point hardware,
2437 2439 * unless FP hardware explicitly disabled.
2438 2440 */
2439 2441 if (fpu_exists && (fpu_pentium_fdivbug || fp_kind == FP_NO))
2440 2442 halt("No working FP hardware found");
2441 2443 #endif
2442 2444
2443 2445 maxmem = freemem;
2444 2446
2445 2447 cpu_event_init_cpu(CPU);
2446 2448 cpupm_init(CPU);
2447 2449 (void) mach_cpu_create_device_node(CPU, NULL);
2448 2450
2449 2451 pg_init();
2450 2452 }
2451 2453
2452 2454 static int
2453 2455 pp_in_range(page_t *pp, uint64_t low_addr, uint64_t high_addr)
2454 2456 {
2455 2457 return ((pp->p_pagenum >= btop(low_addr)) &&
2456 2458 (pp->p_pagenum < btopr(high_addr)));
2457 2459 }
2458 2460
2459 2461 static int
2460 2462 pp_in_module(page_t *pp, const rd_existing_t *modranges)
2461 2463 {
2462 2464 uint_t i;
2463 2465
2464 2466 for (i = 0; modranges[i].phys != 0; i++) {
2465 2467 if (pp_in_range(pp, modranges[i].phys,
2466 2468 modranges[i].phys + modranges[i].size))
2467 2469 return (1);
2468 2470 }
2469 2471
2470 2472 return (0);
2471 2473 }
2472 2474
2473 2475 void
2474 2476 release_bootstrap(void)
2475 2477 {
2476 2478 int root_is_ramdisk;
2477 2479 page_t *pp;
2478 2480 extern void kobj_boot_unmountroot(void);
2479 2481 extern dev_t rootdev;
2480 2482 uint_t i;
2481 2483 char propname[32];
2482 2484 rd_existing_t *modranges;
2483 2485 #if !defined(__xpv)
2484 2486 pfn_t pfn;
2485 2487 #endif
2486 2488
2487 2489 /*
2488 2490 * Save the bootfs module ranges so that we can reserve them below
2489 2491 * for the real bootfs.
2490 2492 */
2491 2493 modranges = kmem_alloc(sizeof (rd_existing_t) * MAX_BOOT_MODULES,
2492 2494 KM_SLEEP);
2493 2495 for (i = 0; ; i++) {
2494 2496 uint64_t start, size;
2495 2497
2496 2498 modranges[i].phys = 0;
2497 2499
2498 2500 (void) snprintf(propname, sizeof (propname),
2499 2501 "module-addr-%u", i);
2500 2502 if (do_bsys_getproplen(NULL, propname) <= 0)
2501 2503 break;
2502 2504 (void) do_bsys_getprop(NULL, propname, &start);
2503 2505
2504 2506 (void) snprintf(propname, sizeof (propname),
2505 2507 "module-size-%u", i);
2506 2508 if (do_bsys_getproplen(NULL, propname) <= 0)
2507 2509 break;
2508 2510 (void) do_bsys_getprop(NULL, propname, &size);
2509 2511
2510 2512 modranges[i].phys = start;
2511 2513 modranges[i].size = size;
2512 2514 }
2513 2515
|
↓ open down ↓ |
456 lines elided |
↑ open up ↑ |
2514 2516 /* unmount boot ramdisk and release kmem usage */
2515 2517 kobj_boot_unmountroot();
2516 2518
2517 2519 /*
2518 2520 * We're finished using the boot loader so free its pages.
2519 2521 */
2520 2522 PRM_POINT("Unmapping lower boot pages");
2521 2523
2522 2524 clear_boot_mappings(0, _userlimit);
2523 2525
2526 +#if 0
2527 + if (fb_info.paddr != 0 && fb_info.fb_type != FB_TYPE_EGA_TEXT) {
2528 + clear_boot_mappings(fb_info.paddr,
2529 + P2ROUNDUP(fb_info.paddr + fb_info.fb_size, MMU_PAGESIZE));
2530 + clear_boot_mappings((uintptr_t)fb_info.fb,
2531 + P2ROUNDUP((uintptr_t)fb_info.fb + fb_info.fb_size,
2532 + MMU_PAGESIZE));
2533 + }
2534 +#endif
2535 +
2524 2536 postbootkernelbase = kernelbase;
2525 2537
2526 2538 /*
2527 2539 * If root isn't on ramdisk, destroy the hardcoded
2528 2540 * ramdisk node now and release the memory. Else,
2529 2541 * ramdisk memory is kept in rd_pages.
2530 2542 */
2531 2543 root_is_ramdisk = (getmajor(rootdev) == ddi_name_to_major("ramdisk"));
2532 2544 if (!root_is_ramdisk) {
2533 2545 dev_info_t *dip = ddi_find_devinfo("ramdisk", -1, 0);
2534 2546 ASSERT(dip && ddi_get_parent(dip) == ddi_root_node());
2535 2547 ndi_rele_devi(dip); /* held from ddi_find_devinfo */
2536 2548 (void) ddi_remove_child(dip, 0);
2537 2549 }
2538 2550
2539 2551 PRM_POINT("Releasing boot pages");
2540 2552 while (bootpages) {
2541 2553 extern uint64_t ramdisk_start, ramdisk_end;
2542 2554 pp = bootpages;
2543 2555 bootpages = pp->p_next;
2544 2556
2545 2557
2546 2558 /* Keep pages for the lower 64K */
2547 2559 if (pp_in_range(pp, 0, 0x40000)) {
2548 2560 pp->p_next = lower_pages;
2549 2561 lower_pages = pp;
2550 2562 lower_pages_count++;
2551 2563 continue;
2552 2564 }
2553 2565
2554 2566 if (root_is_ramdisk && pp_in_range(pp, ramdisk_start,
2555 2567 ramdisk_end) || pp_in_module(pp, modranges)) {
2556 2568 pp->p_next = rd_pages;
2557 2569 rd_pages = pp;
2558 2570 continue;
2559 2571 }
2560 2572 pp->p_next = (struct page *)0;
2561 2573 pp->p_prev = (struct page *)0;
2562 2574 PP_CLRBOOTPAGES(pp);
2563 2575 page_free(pp, 1);
2564 2576 }
2565 2577 PRM_POINT("Boot pages released");
2566 2578
2567 2579 kmem_free(modranges, sizeof (rd_existing_t) * 99);
2568 2580
2569 2581 #if !defined(__xpv)
2570 2582 /* XXPV -- note this following bunch of code needs to be revisited in Xen 3.0 */
2571 2583 /*
2572 2584 * Find 1 page below 1 MB so that other processors can boot up or
2573 2585 * so that any processor can resume.
2574 2586 * Make sure it has a kernel VA as well as a 1:1 mapping.
2575 2587 * We should have just free'd one up.
2576 2588 */
2577 2589
2578 2590 /*
2579 2591 * 0x10 pages is 64K. Leave the bottom 64K alone
2580 2592 * for BIOS.
2581 2593 */
2582 2594 for (pfn = 0x10; pfn < btop(1*1024*1024); pfn++) {
2583 2595 if (page_numtopp_alloc(pfn) == NULL)
2584 2596 continue;
2585 2597 rm_platter_va = i86devmap(pfn, 1,
2586 2598 PROT_READ | PROT_WRITE | PROT_EXEC);
2587 2599 rm_platter_pa = ptob(pfn);
2588 2600 break;
2589 2601 }
2590 2602 if (pfn == btop(1*1024*1024) && use_mp)
2591 2603 panic("No page below 1M available for starting "
2592 2604 "other processors or for resuming from system-suspend");
2593 2605 #endif /* !__xpv */
2594 2606 }
2595 2607
2596 2608 /*
2597 2609 * Initialize the platform-specific parts of a page_t.
2598 2610 */
2599 2611 void
2600 2612 add_physmem_cb(page_t *pp, pfn_t pnum)
2601 2613 {
2602 2614 pp->p_pagenum = pnum;
2603 2615 pp->p_mapping = NULL;
2604 2616 pp->p_embed = 0;
2605 2617 pp->p_share = 0;
2606 2618 pp->p_mlentry = 0;
2607 2619 }
2608 2620
2609 2621 /*
2610 2622 * kphysm_init() initializes physical memory.
2611 2623 */
2612 2624 static pgcnt_t
2613 2625 kphysm_init(
2614 2626 page_t *pp,
2615 2627 pgcnt_t npages)
2616 2628 {
2617 2629 struct memlist *pmem;
2618 2630 struct memseg *cur_memseg;
2619 2631 pfn_t base_pfn;
2620 2632 pfn_t end_pfn;
2621 2633 pgcnt_t num;
2622 2634 pgcnt_t pages_done = 0;
2623 2635 uint64_t addr;
2624 2636 uint64_t size;
2625 2637 extern pfn_t ddiphysmin;
2626 2638 extern int mnode_xwa;
2627 2639 int ms = 0, me = 0;
2628 2640
2629 2641 ASSERT(page_hash != NULL && page_hashsz != 0);
2630 2642
2631 2643 cur_memseg = memseg_base;
2632 2644 for (pmem = phys_avail; pmem && npages; pmem = pmem->ml_next) {
2633 2645 /*
2634 2646 * In a 32 bit kernel can't use higher memory if we're
2635 2647 * not booting in PAE mode. This check takes care of that.
2636 2648 */
2637 2649 addr = pmem->ml_address;
2638 2650 size = pmem->ml_size;
2639 2651 if (btop(addr) > physmax)
2640 2652 continue;
2641 2653
2642 2654 /*
2643 2655 * align addr and size - they may not be at page boundaries
2644 2656 */
2645 2657 if ((addr & MMU_PAGEOFFSET) != 0) {
2646 2658 addr += MMU_PAGEOFFSET;
2647 2659 addr &= ~(uint64_t)MMU_PAGEOFFSET;
2648 2660 size -= addr - pmem->ml_address;
2649 2661 }
2650 2662
2651 2663 /* only process pages below or equal to physmax */
2652 2664 if ((btop(addr + size) - 1) > physmax)
2653 2665 size = ptob(physmax - btop(addr) + 1);
2654 2666
2655 2667 num = btop(size);
2656 2668 if (num == 0)
2657 2669 continue;
2658 2670
2659 2671 if (num > npages)
2660 2672 num = npages;
2661 2673
2662 2674 npages -= num;
2663 2675 pages_done += num;
2664 2676 base_pfn = btop(addr);
2665 2677
2666 2678 if (prom_debug)
2667 2679 prom_printf("MEMSEG addr=0x%" PRIx64
2668 2680 " pgs=0x%lx pfn 0x%lx-0x%lx\n",
2669 2681 addr, num, base_pfn, base_pfn + num);
2670 2682
2671 2683 /*
2672 2684 * Ignore pages below ddiphysmin to simplify ddi memory
2673 2685 * allocation with non-zero addr_lo requests.
2674 2686 */
2675 2687 if (base_pfn < ddiphysmin) {
2676 2688 if (base_pfn + num <= ddiphysmin)
2677 2689 continue;
2678 2690 pp += (ddiphysmin - base_pfn);
2679 2691 num -= (ddiphysmin - base_pfn);
2680 2692 base_pfn = ddiphysmin;
2681 2693 }
2682 2694
2683 2695 /*
2684 2696 * mnode_xwa is greater than 1 when large pages regions can
2685 2697 * cross memory node boundaries. To prevent the formation
2686 2698 * of these large pages, configure the memsegs based on the
2687 2699 * memory node ranges which had been made non-contiguous.
2688 2700 */
2689 2701 if (mnode_xwa > 1) {
2690 2702
2691 2703 end_pfn = base_pfn + num - 1;
2692 2704 ms = PFN_2_MEM_NODE(base_pfn);
2693 2705 me = PFN_2_MEM_NODE(end_pfn);
2694 2706
2695 2707 if (ms != me) {
2696 2708 /*
2697 2709 * current range spans more than 1 memory node.
2698 2710 * Set num to only the pfn range in the start
2699 2711 * memory node.
2700 2712 */
2701 2713 num = mem_node_config[ms].physmax - base_pfn
2702 2714 + 1;
2703 2715 ASSERT(end_pfn > mem_node_config[ms].physmax);
2704 2716 }
2705 2717 }
2706 2718
2707 2719 for (;;) {
2708 2720 /*
2709 2721 * Build the memsegs entry
2710 2722 */
2711 2723 cur_memseg->pages = pp;
2712 2724 cur_memseg->epages = pp + num;
2713 2725 cur_memseg->pages_base = base_pfn;
2714 2726 cur_memseg->pages_end = base_pfn + num;
2715 2727
2716 2728 /*
2717 2729 * Insert into memseg list in decreasing pfn range
2718 2730 * order. Low memory is typically more fragmented such
2719 2731 * that this ordering keeps the larger ranges at the
2720 2732 * front of the list for code that searches memseg.
2721 2733 * This ASSERTS that the memsegs coming in from boot
2722 2734 * are in increasing physical address order and not
2723 2735 * contiguous.
2724 2736 */
2725 2737 if (memsegs != NULL) {
2726 2738 ASSERT(cur_memseg->pages_base >=
2727 2739 memsegs->pages_end);
2728 2740 cur_memseg->next = memsegs;
2729 2741 }
2730 2742 memsegs = cur_memseg;
2731 2743
2732 2744 /*
2733 2745 * add_physmem() initializes the PSM part of the page
2734 2746 * struct by calling the PSM back with add_physmem_cb().
2735 2747 * In addition it coalesces pages into larger pages as
2736 2748 * it initializes them.
2737 2749 */
2738 2750 add_physmem(pp, num, base_pfn);
2739 2751 cur_memseg++;
2740 2752 availrmem_initial += num;
2741 2753 availrmem += num;
2742 2754
2743 2755 pp += num;
2744 2756 if (ms >= me)
2745 2757 break;
2746 2758
2747 2759 /* process next memory node range */
2748 2760 ms++;
2749 2761 base_pfn = mem_node_config[ms].physbase;
2750 2762 num = MIN(mem_node_config[ms].physmax,
2751 2763 end_pfn) - base_pfn + 1;
2752 2764 }
2753 2765 }
2754 2766
2755 2767 PRM_DEBUG(availrmem_initial);
2756 2768 PRM_DEBUG(availrmem);
2757 2769 PRM_DEBUG(freemem);
2758 2770 build_pfn_hash();
2759 2771 return (pages_done);
2760 2772 }
2761 2773
2762 2774 /*
2763 2775 * Kernel VM initialization.
2764 2776 */
2765 2777 static void
2766 2778 kvm_init(void)
2767 2779 {
2768 2780 ASSERT((((uintptr_t)s_text) & MMU_PAGEOFFSET) == 0);
2769 2781
2770 2782 /*
2771 2783 * Put the kernel segments in kernel address space.
2772 2784 */
2773 2785 rw_enter(&kas.a_lock, RW_WRITER);
2774 2786 as_avlinit(&kas);
2775 2787
2776 2788 (void) seg_attach(&kas, s_text, e_moddata - s_text, &ktextseg);
2777 2789 (void) segkmem_create(&ktextseg);
2778 2790
2779 2791 (void) seg_attach(&kas, (caddr_t)valloc_base, valloc_sz, &kvalloc);
2780 2792 (void) segkmem_create(&kvalloc);
2781 2793
2782 2794 (void) seg_attach(&kas, kernelheap,
2783 2795 ekernelheap - kernelheap, &kvseg);
2784 2796 (void) segkmem_create(&kvseg);
2785 2797
2786 2798 if (core_size > 0) {
2787 2799 PRM_POINT("attaching kvseg_core");
2788 2800 (void) seg_attach(&kas, (caddr_t)core_base, core_size,
2789 2801 &kvseg_core);
2790 2802 (void) segkmem_create(&kvseg_core);
2791 2803 }
2792 2804
2793 2805 if (segziosize > 0) {
2794 2806 PRM_POINT("attaching segzio");
2795 2807 (void) seg_attach(&kas, segzio_base, mmu_ptob(segziosize),
2796 2808 &kzioseg);
2797 2809 (void) segkmem_zio_create(&kzioseg);
2798 2810
2799 2811 /* create zio area covering new segment */
2800 2812 segkmem_zio_init(segzio_base, mmu_ptob(segziosize));
2801 2813 }
2802 2814
2803 2815 (void) seg_attach(&kas, kdi_segdebugbase, kdi_segdebugsize, &kdebugseg);
2804 2816 (void) segkmem_create(&kdebugseg);
2805 2817
2806 2818 rw_exit(&kas.a_lock);
2807 2819
2808 2820 /*
2809 2821 * Ensure that the red zone at kernelbase is never accessible.
2810 2822 */
2811 2823 PRM_POINT("protecting redzone");
2812 2824 (void) as_setprot(&kas, (caddr_t)kernelbase, KERNEL_REDZONE_SIZE, 0);
2813 2825
2814 2826 /*
2815 2827 * Make the text writable so that it can be hot patched by DTrace.
2816 2828 */
2817 2829 (void) as_setprot(&kas, s_text, e_modtext - s_text,
2818 2830 PROT_READ | PROT_WRITE | PROT_EXEC);
2819 2831
2820 2832 /*
2821 2833 * Make data writable until end.
2822 2834 */
2823 2835 (void) as_setprot(&kas, s_data, e_moddata - s_data,
2824 2836 PROT_READ | PROT_WRITE | PROT_EXEC);
2825 2837 }
2826 2838
2827 2839 #ifndef __xpv
2828 2840 /*
2829 2841 * Solaris adds an entry for Write Combining caching to the PAT
2830 2842 */
2831 2843 static uint64_t pat_attr_reg = PAT_DEFAULT_ATTRIBUTE;
2832 2844
2833 2845 void
2834 2846 pat_sync(void)
2835 2847 {
2836 2848 ulong_t cr0, cr0_orig, cr4;
2837 2849
2838 2850 if (!is_x86_feature(x86_featureset, X86FSET_PAT))
2839 2851 return;
2840 2852 cr0_orig = cr0 = getcr0();
2841 2853 cr4 = getcr4();
2842 2854
2843 2855 /* disable caching and flush all caches and TLBs */
2844 2856 cr0 |= CR0_CD;
2845 2857 cr0 &= ~CR0_NW;
2846 2858 setcr0(cr0);
2847 2859 invalidate_cache();
2848 2860 if (cr4 & CR4_PGE) {
2849 2861 setcr4(cr4 & ~(ulong_t)CR4_PGE);
2850 2862 setcr4(cr4);
2851 2863 } else {
2852 2864 reload_cr3();
2853 2865 }
2854 2866
2855 2867 /* add our entry to the PAT */
2856 2868 wrmsr(REG_PAT, pat_attr_reg);
2857 2869
2858 2870 /* flush TLBs and cache again, then reenable cr0 caching */
2859 2871 if (cr4 & CR4_PGE) {
2860 2872 setcr4(cr4 & ~(ulong_t)CR4_PGE);
2861 2873 setcr4(cr4);
2862 2874 } else {
2863 2875 reload_cr3();
2864 2876 }
2865 2877 invalidate_cache();
2866 2878 setcr0(cr0_orig);
2867 2879 }
2868 2880
2869 2881 #endif /* !__xpv */
2870 2882
2871 2883 #if defined(_SOFT_HOSTID)
2872 2884 /*
2873 2885 * On platforms that do not have a hardware serial number, attempt
2874 2886 * to set one based on the contents of /etc/hostid. If this file does
2875 2887 * not exist, assume that we are to generate a new hostid and set
2876 2888 * it in the kernel, for subsequent saving by a userland process
2877 2889 * once the system is up and the root filesystem is mounted r/w.
2878 2890 *
2879 2891 * In order to gracefully support upgrade on OpenSolaris, if
2880 2892 * /etc/hostid does not exist, we will attempt to get a serial number
2881 2893 * using the legacy method (/kernel/misc/sysinit).
2882 2894 *
2883 2895 * If that isn't present, we attempt to use an SMBIOS UUID, which is
2884 2896 * a hardware serial number. Note that we don't automatically trust
2885 2897 * all SMBIOS UUIDs (some older platforms are defective and ship duplicate
2886 2898 * UUIDs in violation of the standard), we check against a blacklist.
2887 2899 *
2888 2900 * In an attempt to make the hostid less prone to abuse
2889 2901 * (for license circumvention, etc), we store it in /etc/hostid
2890 2902 * in rot47 format.
2891 2903 */
2892 2904 extern volatile unsigned long tenmicrodata;
2893 2905 static int atoi(char *);
2894 2906
2895 2907 /*
2896 2908 * Set this to non-zero in /etc/system if you think your SMBIOS returns a
2897 2909 * UUID that is not unique. (Also report it so that the smbios_uuid_blacklist
2898 2910 * array can be updated.)
2899 2911 */
2900 2912 int smbios_broken_uuid = 0;
2901 2913
2902 2914 /*
2903 2915 * List of known bad UUIDs. This is just the lower 32-bit values, since
2904 2916 * that's what we use for the host id. If your hostid falls here, you need
2905 2917 * to contact your hardware OEM for a fix for your BIOS.
2906 2918 */
2907 2919 static unsigned char
2908 2920 smbios_uuid_blacklist[][16] = {
2909 2921
2910 2922 { /* Reported bad UUID (Google search) */
2911 2923 0x00, 0x02, 0x00, 0x03, 0x00, 0x04, 0x00, 0x05,
2912 2924 0x00, 0x06, 0x00, 0x07, 0x00, 0x08, 0x00, 0x09,
2913 2925 },
2914 2926 { /* Known bad DELL UUID */
2915 2927 0x4C, 0x4C, 0x45, 0x44, 0x00, 0x00, 0x20, 0x10,
2916 2928 0x80, 0x20, 0x80, 0xC0, 0x4F, 0x20, 0x20, 0x20,
2917 2929 },
2918 2930 { /* Uninitialized flash */
2919 2931 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
2920 2932 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
2921 2933 },
2922 2934 { /* All zeros */
2923 2935 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
2924 2936 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
2925 2937 },
2926 2938 };
2927 2939
2928 2940 static int32_t
2929 2941 uuid_to_hostid(const uint8_t *uuid)
2930 2942 {
2931 2943 /*
2932 2944 * Although the UUIDs are 128-bits, they may not distribute entropy
2933 2945 * evenly. We would like to use SHA or MD5, but those are located
2934 2946 * in loadable modules and not available this early in boot. As we
2935 2947 * don't need the values to be cryptographically strong, we just
2936 2948 * generate 32-bit vaue by xor'ing the various sequences together,
2937 2949 * which ensures that the entire UUID contributes to the hostid.
2938 2950 */
2939 2951 uint32_t id = 0;
2940 2952
2941 2953 /* first check against the blacklist */
2942 2954 for (int i = 0; i < (sizeof (smbios_uuid_blacklist) / 16); i++) {
2943 2955 if (bcmp(smbios_uuid_blacklist[0], uuid, 16) == 0) {
2944 2956 cmn_err(CE_CONT, "?Broken SMBIOS UUID. "
2945 2957 "Contact BIOS manufacturer for repair.\n");
2946 2958 return ((int32_t)HW_INVALID_HOSTID);
2947 2959 }
2948 2960 }
2949 2961
2950 2962 for (int i = 0; i < 16; i++)
2951 2963 id ^= ((uuid[i]) << (8 * (i % sizeof (id))));
2952 2964
2953 2965 /* Make sure return value is positive */
2954 2966 return (id & 0x7fffffff);
2955 2967 }
2956 2968
2957 2969 static int32_t
2958 2970 set_soft_hostid(void)
2959 2971 {
2960 2972 struct _buf *file;
2961 2973 char tokbuf[MAXNAMELEN];
2962 2974 token_t token;
2963 2975 int done = 0;
2964 2976 u_longlong_t tmp;
2965 2977 int i;
2966 2978 int32_t hostid = (int32_t)HW_INVALID_HOSTID;
2967 2979 unsigned char *c;
2968 2980 hrtime_t tsc;
2969 2981 smbios_system_t smsys;
2970 2982
2971 2983 /*
2972 2984 * If /etc/hostid file not found, we'd like to get a pseudo
2973 2985 * random number to use at the hostid. A nice way to do this
2974 2986 * is to read the real time clock. To remain xen-compatible,
2975 2987 * we can't poke the real hardware, so we use tsc_read() to
2976 2988 * read the real time clock. However, there is an ominous
2977 2989 * warning in tsc_read that says it can return zero, so we
2978 2990 * deal with that possibility by falling back to using the
2979 2991 * (hopefully random enough) value in tenmicrodata.
2980 2992 */
2981 2993
2982 2994 if ((file = kobj_open_file(hostid_file)) == (struct _buf *)-1) {
2983 2995 /*
2984 2996 * hostid file not found - try to load sysinit module
2985 2997 * and see if it has a nonzero hostid value...use that
2986 2998 * instead of generating a new hostid here if so.
2987 2999 */
2988 3000 if ((i = modload("misc", "sysinit")) != -1) {
2989 3001 if (strlen(hw_serial) > 0)
2990 3002 hostid = (int32_t)atoi(hw_serial);
2991 3003 (void) modunload(i);
2992 3004 }
2993 3005
2994 3006 /*
2995 3007 * We try to use the SMBIOS UUID. But not if it is blacklisted
2996 3008 * in /etc/system.
2997 3009 */
2998 3010 if ((hostid == HW_INVALID_HOSTID) &&
2999 3011 (smbios_broken_uuid == 0) &&
3000 3012 (ksmbios != NULL) &&
3001 3013 (smbios_info_system(ksmbios, &smsys) != SMB_ERR) &&
3002 3014 (smsys.smbs_uuidlen >= 16)) {
3003 3015 hostid = uuid_to_hostid(smsys.smbs_uuid);
3004 3016 }
3005 3017
3006 3018 /*
3007 3019 * Generate a "random" hostid using the clock. These
3008 3020 * hostids will change on each boot if the value is not
3009 3021 * saved to a persistent /etc/hostid file.
3010 3022 */
3011 3023 if (hostid == HW_INVALID_HOSTID) {
3012 3024 tsc = tsc_read();
3013 3025 if (tsc == 0) /* tsc_read can return zero sometimes */
3014 3026 hostid = (int32_t)tenmicrodata & 0x0CFFFFF;
3015 3027 else
3016 3028 hostid = (int32_t)tsc & 0x0CFFFFF;
3017 3029 }
3018 3030 } else {
3019 3031 /* hostid file found */
3020 3032 while (!done) {
3021 3033 token = kobj_lex(file, tokbuf, sizeof (tokbuf));
3022 3034
3023 3035 switch (token) {
3024 3036 case POUND:
3025 3037 /*
3026 3038 * skip comments
3027 3039 */
3028 3040 kobj_find_eol(file);
3029 3041 break;
3030 3042 case STRING:
3031 3043 /*
3032 3044 * un-rot47 - obviously this
3033 3045 * nonsense is ascii-specific
3034 3046 */
3035 3047 for (c = (unsigned char *)tokbuf;
3036 3048 *c != '\0'; c++) {
3037 3049 *c += 47;
3038 3050 if (*c > '~')
3039 3051 *c -= 94;
3040 3052 else if (*c < '!')
3041 3053 *c += 94;
3042 3054 }
3043 3055 /*
3044 3056 * now we should have a real number
3045 3057 */
3046 3058
3047 3059 if (kobj_getvalue(tokbuf, &tmp) != 0)
3048 3060 kobj_file_err(CE_WARN, file,
3049 3061 "Bad value %s for hostid",
3050 3062 tokbuf);
3051 3063 else
3052 3064 hostid = (int32_t)tmp;
3053 3065
3054 3066 break;
3055 3067 case EOF:
3056 3068 done = 1;
3057 3069 /* FALLTHROUGH */
3058 3070 case NEWLINE:
3059 3071 kobj_newline(file);
3060 3072 break;
3061 3073 default:
3062 3074 break;
3063 3075
3064 3076 }
3065 3077 }
3066 3078 if (hostid == HW_INVALID_HOSTID) /* didn't find a hostid */
3067 3079 kobj_file_err(CE_WARN, file,
3068 3080 "hostid missing or corrupt");
3069 3081
3070 3082 kobj_close_file(file);
3071 3083 }
3072 3084 /*
3073 3085 * hostid is now the value read from /etc/hostid, or the
3074 3086 * new hostid we generated in this routine or HW_INVALID_HOSTID if not
3075 3087 * set.
3076 3088 */
3077 3089 return (hostid);
3078 3090 }
3079 3091
3080 3092 static int
3081 3093 atoi(char *p)
3082 3094 {
3083 3095 int i = 0;
3084 3096
3085 3097 while (*p != '\0')
3086 3098 i = 10 * i + (*p++ - '0');
3087 3099
3088 3100 return (i);
3089 3101 }
3090 3102
3091 3103 #endif /* _SOFT_HOSTID */
3092 3104
3093 3105 void
3094 3106 get_system_configuration(void)
3095 3107 {
3096 3108 char prop[32];
3097 3109 u_longlong_t nodes_ll, cpus_pernode_ll, lvalue;
3098 3110
3099 3111 if (BOP_GETPROPLEN(bootops, "nodes") > sizeof (prop) ||
3100 3112 BOP_GETPROP(bootops, "nodes", prop) < 0 ||
3101 3113 kobj_getvalue(prop, &nodes_ll) == -1 ||
3102 3114 nodes_ll > MAXNODES ||
3103 3115 BOP_GETPROPLEN(bootops, "cpus_pernode") > sizeof (prop) ||
3104 3116 BOP_GETPROP(bootops, "cpus_pernode", prop) < 0 ||
3105 3117 kobj_getvalue(prop, &cpus_pernode_ll) == -1) {
3106 3118 system_hardware.hd_nodes = 1;
3107 3119 system_hardware.hd_cpus_per_node = 0;
3108 3120 } else {
3109 3121 system_hardware.hd_nodes = (int)nodes_ll;
3110 3122 system_hardware.hd_cpus_per_node = (int)cpus_pernode_ll;
3111 3123 }
3112 3124
3113 3125 if (BOP_GETPROPLEN(bootops, "kernelbase") > sizeof (prop) ||
3114 3126 BOP_GETPROP(bootops, "kernelbase", prop) < 0 ||
3115 3127 kobj_getvalue(prop, &lvalue) == -1)
3116 3128 eprom_kernelbase = NULL;
3117 3129 else
3118 3130 eprom_kernelbase = (uintptr_t)lvalue;
3119 3131
3120 3132 if (BOP_GETPROPLEN(bootops, "segmapsize") > sizeof (prop) ||
3121 3133 BOP_GETPROP(bootops, "segmapsize", prop) < 0 ||
3122 3134 kobj_getvalue(prop, &lvalue) == -1)
3123 3135 segmapsize = SEGMAPDEFAULT;
3124 3136 else
3125 3137 segmapsize = (uintptr_t)lvalue;
3126 3138
3127 3139 if (BOP_GETPROPLEN(bootops, "segmapfreelists") > sizeof (prop) ||
3128 3140 BOP_GETPROP(bootops, "segmapfreelists", prop) < 0 ||
3129 3141 kobj_getvalue(prop, &lvalue) == -1)
3130 3142 segmapfreelists = 0; /* use segmap driver default */
3131 3143 else
3132 3144 segmapfreelists = (int)lvalue;
3133 3145
3134 3146 /* physmem used to be here, but moved much earlier to fakebop.c */
3135 3147 }
3136 3148
3137 3149 /*
3138 3150 * Add to a memory list.
3139 3151 * start = start of new memory segment
3140 3152 * len = length of new memory segment in bytes
3141 3153 * new = pointer to a new struct memlist
3142 3154 * memlistp = memory list to which to add segment.
3143 3155 */
3144 3156 void
3145 3157 memlist_add(
3146 3158 uint64_t start,
3147 3159 uint64_t len,
3148 3160 struct memlist *new,
3149 3161 struct memlist **memlistp)
3150 3162 {
3151 3163 struct memlist *cur;
3152 3164 uint64_t end = start + len;
3153 3165
3154 3166 new->ml_address = start;
3155 3167 new->ml_size = len;
3156 3168
3157 3169 cur = *memlistp;
3158 3170
3159 3171 while (cur) {
3160 3172 if (cur->ml_address >= end) {
3161 3173 new->ml_next = cur;
3162 3174 *memlistp = new;
3163 3175 new->ml_prev = cur->ml_prev;
3164 3176 cur->ml_prev = new;
3165 3177 return;
3166 3178 }
3167 3179 ASSERT(cur->ml_address + cur->ml_size <= start);
3168 3180 if (cur->ml_next == NULL) {
3169 3181 cur->ml_next = new;
3170 3182 new->ml_prev = cur;
3171 3183 new->ml_next = NULL;
3172 3184 return;
3173 3185 }
3174 3186 memlistp = &cur->ml_next;
3175 3187 cur = cur->ml_next;
3176 3188 }
3177 3189 }
3178 3190
3179 3191 void
3180 3192 kobj_vmem_init(vmem_t **text_arena, vmem_t **data_arena)
3181 3193 {
3182 3194 size_t tsize = e_modtext - modtext;
3183 3195 size_t dsize = e_moddata - moddata;
3184 3196
3185 3197 *text_arena = vmem_create("module_text", tsize ? modtext : NULL, tsize,
3186 3198 1, segkmem_alloc, segkmem_free, heaptext_arena, 0, VM_SLEEP);
3187 3199 *data_arena = vmem_create("module_data", dsize ? moddata : NULL, dsize,
3188 3200 1, segkmem_alloc, segkmem_free, heap32_arena, 0, VM_SLEEP);
3189 3201 }
3190 3202
3191 3203 caddr_t
3192 3204 kobj_text_alloc(vmem_t *arena, size_t size)
3193 3205 {
3194 3206 return (vmem_alloc(arena, size, VM_SLEEP | VM_BESTFIT));
3195 3207 }
3196 3208
3197 3209 /*ARGSUSED*/
3198 3210 caddr_t
3199 3211 kobj_texthole_alloc(caddr_t addr, size_t size)
3200 3212 {
3201 3213 panic("unexpected call to kobj_texthole_alloc()");
3202 3214 /*NOTREACHED*/
3203 3215 return (0);
3204 3216 }
3205 3217
3206 3218 /*ARGSUSED*/
3207 3219 void
3208 3220 kobj_texthole_free(caddr_t addr, size_t size)
3209 3221 {
3210 3222 panic("unexpected call to kobj_texthole_free()");
3211 3223 }
3212 3224
3213 3225 /*
3214 3226 * This is called just after configure() in startup().
3215 3227 *
3216 3228 * The ISALIST concept is a bit hopeless on Intel, because
3217 3229 * there's no guarantee of an ever-more-capable processor
3218 3230 * given that various parts of the instruction set may appear
3219 3231 * and disappear between different implementations.
3220 3232 *
3221 3233 * While it would be possible to correct it and even enhance
3222 3234 * it somewhat, the explicit hardware capability bitmask allows
3223 3235 * more flexibility.
3224 3236 *
3225 3237 * So, we just leave this alone.
3226 3238 */
3227 3239 void
3228 3240 setx86isalist(void)
3229 3241 {
3230 3242 char *tp;
3231 3243 size_t len;
3232 3244 extern char *isa_list;
3233 3245
3234 3246 #define TBUFSIZE 1024
3235 3247
3236 3248 tp = kmem_alloc(TBUFSIZE, KM_SLEEP);
3237 3249 *tp = '\0';
3238 3250
3239 3251 #if defined(__amd64)
3240 3252 (void) strcpy(tp, "amd64 ");
3241 3253 #endif
3242 3254
3243 3255 switch (x86_vendor) {
3244 3256 case X86_VENDOR_Intel:
3245 3257 case X86_VENDOR_AMD:
3246 3258 case X86_VENDOR_TM:
3247 3259 if (is_x86_feature(x86_featureset, X86FSET_CMOV)) {
3248 3260 /*
3249 3261 * Pentium Pro or later
3250 3262 */
3251 3263 (void) strcat(tp, "pentium_pro");
3252 3264 (void) strcat(tp,
3253 3265 is_x86_feature(x86_featureset, X86FSET_MMX) ?
3254 3266 "+mmx pentium_pro " : " ");
3255 3267 }
3256 3268 /*FALLTHROUGH*/
3257 3269 case X86_VENDOR_Cyrix:
3258 3270 /*
3259 3271 * The Cyrix 6x86 does not have any Pentium features
3260 3272 * accessible while not at privilege level 0.
3261 3273 */
3262 3274 if (is_x86_feature(x86_featureset, X86FSET_CPUID)) {
3263 3275 (void) strcat(tp, "pentium");
3264 3276 (void) strcat(tp,
3265 3277 is_x86_feature(x86_featureset, X86FSET_MMX) ?
3266 3278 "+mmx pentium " : " ");
3267 3279 }
3268 3280 break;
3269 3281 default:
3270 3282 break;
3271 3283 }
3272 3284 (void) strcat(tp, "i486 i386 i86");
3273 3285 len = strlen(tp) + 1; /* account for NULL at end of string */
3274 3286 isa_list = strcpy(kmem_alloc(len, KM_SLEEP), tp);
3275 3287 kmem_free(tp, TBUFSIZE);
3276 3288
3277 3289 #undef TBUFSIZE
3278 3290 }
3279 3291
3280 3292
3281 3293 #ifdef __amd64
3282 3294
3283 3295 void *
3284 3296 device_arena_alloc(size_t size, int vm_flag)
3285 3297 {
3286 3298 return (vmem_alloc(device_arena, size, vm_flag));
3287 3299 }
3288 3300
3289 3301 void
3290 3302 device_arena_free(void *vaddr, size_t size)
3291 3303 {
3292 3304 vmem_free(device_arena, vaddr, size);
3293 3305 }
3294 3306
3295 3307 #else /* __i386 */
3296 3308
3297 3309 void *
3298 3310 device_arena_alloc(size_t size, int vm_flag)
3299 3311 {
3300 3312 caddr_t vaddr;
3301 3313 uintptr_t v;
3302 3314 size_t start;
3303 3315 size_t end;
3304 3316
3305 3317 vaddr = vmem_alloc(heap_arena, size, vm_flag);
3306 3318 if (vaddr == NULL)
3307 3319 return (NULL);
3308 3320
3309 3321 v = (uintptr_t)vaddr;
3310 3322 ASSERT(v >= kernelbase);
3311 3323 ASSERT(v + size <= valloc_base);
3312 3324
3313 3325 start = btop(v - kernelbase);
3314 3326 end = btop(v + size - 1 - kernelbase);
3315 3327 ASSERT(start < toxic_bit_map_len);
3316 3328 ASSERT(end < toxic_bit_map_len);
3317 3329
3318 3330 while (start <= end) {
3319 3331 BT_ATOMIC_SET(toxic_bit_map, start);
3320 3332 ++start;
3321 3333 }
3322 3334 return (vaddr);
3323 3335 }
3324 3336
3325 3337 void
3326 3338 device_arena_free(void *vaddr, size_t size)
3327 3339 {
3328 3340 uintptr_t v = (uintptr_t)vaddr;
3329 3341 size_t start;
3330 3342 size_t end;
3331 3343
3332 3344 ASSERT(v >= kernelbase);
3333 3345 ASSERT(v + size <= valloc_base);
3334 3346
3335 3347 start = btop(v - kernelbase);
3336 3348 end = btop(v + size - 1 - kernelbase);
3337 3349 ASSERT(start < toxic_bit_map_len);
3338 3350 ASSERT(end < toxic_bit_map_len);
3339 3351
3340 3352 while (start <= end) {
3341 3353 ASSERT(BT_TEST(toxic_bit_map, start) != 0);
3342 3354 BT_ATOMIC_CLEAR(toxic_bit_map, start);
3343 3355 ++start;
3344 3356 }
3345 3357 vmem_free(heap_arena, vaddr, size);
3346 3358 }
3347 3359
3348 3360 /*
3349 3361 * returns 1st address in range that is in device arena, or NULL
3350 3362 * if len is not NULL it returns the length of the toxic range
3351 3363 */
3352 3364 void *
3353 3365 device_arena_contains(void *vaddr, size_t size, size_t *len)
3354 3366 {
3355 3367 uintptr_t v = (uintptr_t)vaddr;
3356 3368 uintptr_t eaddr = v + size;
3357 3369 size_t start;
3358 3370 size_t end;
3359 3371
3360 3372 /*
3361 3373 * if called very early by kmdb, just return NULL
3362 3374 */
3363 3375 if (toxic_bit_map == NULL)
3364 3376 return (NULL);
3365 3377
3366 3378 /*
3367 3379 * First check if we're completely outside the bitmap range.
3368 3380 */
3369 3381 if (v >= valloc_base || eaddr < kernelbase)
3370 3382 return (NULL);
3371 3383
3372 3384 /*
3373 3385 * Trim ends of search to look at only what the bitmap covers.
3374 3386 */
3375 3387 if (v < kernelbase)
3376 3388 v = kernelbase;
3377 3389 start = btop(v - kernelbase);
3378 3390 end = btop(eaddr - kernelbase);
3379 3391 if (end >= toxic_bit_map_len)
3380 3392 end = toxic_bit_map_len;
3381 3393
3382 3394 if (bt_range(toxic_bit_map, &start, &end, end) == 0)
3383 3395 return (NULL);
3384 3396
3385 3397 v = kernelbase + ptob(start);
3386 3398 if (len != NULL)
3387 3399 *len = ptob(end - start);
3388 3400 return ((void *)v);
3389 3401 }
3390 3402
3391 3403 #endif /* __i386 */
|
↓ open down ↓ |
858 lines elided |
↑ open up ↑ |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX