summaryrefslogtreecommitdiff
path: root/share/doc/gcc-linaro-aarch64-linux-gnu/html/gcc/i386-and-x86_002d64-Options.html
blob: 7ffd5307d68e03aa9d8df7c1bb308c6b239a9e2f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
<html lang="en">
<head>
<title>i386 and x86-64 Options - Using the GNU Compiler Collection (GCC)</title>
<meta http-equiv="Content-Type" content="text/html">
<meta name="description" content="Using the GNU Compiler Collection (GCC)">
<meta name="generator" content="makeinfo 4.13">
<link title="Top" rel="start" href="index.html#Top">
<link rel="up" href="Submodel-Options.html#Submodel-Options" title="Submodel Options">
<link rel="prev" href="HPPA-Options.html#HPPA-Options" title="HPPA Options">
<link rel="next" href="i386-and-x86_002d64-Windows-Options.html#i386-and-x86_002d64-Windows-Options" title="i386 and x86-64 Windows Options">
<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
<!--
Copyright (C) 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997,
1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
2010 Free Software Foundation, Inc.

Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with the
Invariant Sections being ``Funding Free Software'', the Front-Cover
Texts being (a) (see below), and with the Back-Cover Texts being (b)
(see below).  A copy of the license is included in the section entitled
``GNU Free Documentation License''.

(a) The FSF's Front-Cover Text is:

     A GNU Manual

(b) The FSF's Back-Cover Text is:

     You have freedom to copy and modify this GNU Manual, like GNU
     software.  Copies published by the Free Software Foundation raise
     funds for GNU development.-->
<meta http-equiv="Content-Style-Type" content="text/css">
<style type="text/css"><!--
  pre.display { font-family:inherit }
  pre.format  { font-family:inherit }
  pre.smalldisplay { font-family:inherit; font-size:smaller }
  pre.smallformat  { font-family:inherit; font-size:smaller }
  pre.smallexample { font-size:smaller }
  pre.smalllisp    { font-size:smaller }
  span.sc    { font-variant:small-caps }
  span.roman { font-family:serif; font-weight:normal; } 
  span.sansserif { font-family:sans-serif; font-weight:normal; } 
--></style>
</head>
<body>
<div class="node">
<a name="i386-and-x86-64-Options"></a>
<a name="i386-and-x86_002d64-Options"></a>
<p>
Next:&nbsp;<a rel="next" accesskey="n" href="i386-and-x86_002d64-Windows-Options.html#i386-and-x86_002d64-Windows-Options">i386 and x86-64 Windows Options</a>,
Previous:&nbsp;<a rel="previous" accesskey="p" href="HPPA-Options.html#HPPA-Options">HPPA Options</a>,
Up:&nbsp;<a rel="up" accesskey="u" href="Submodel-Options.html#Submodel-Options">Submodel Options</a>
<hr>
</div>

<h4 class="subsection">3.17.17 Intel 386 and AMD x86-64 Options</h4>

<p><a name="index-i386-Options-1409"></a><a name="index-x86_002d64-Options-1410"></a><a name="index-Intel-386-Options-1411"></a><a name="index-AMD-x86_002d64-Options-1412"></a>
These &lsquo;<samp><span class="samp">-m</span></samp>&rsquo; options are defined for the i386 and x86-64 family of
computers:

     <dl>
<dt><code>-mtune=</code><var>cpu-type</var><dd><a name="index-mtune-1413"></a>Tune to <var>cpu-type</var> everything applicable about the generated code, except
for the ABI and the set of available instructions.  The choices for
<var>cpu-type</var> are:
          <dl>
<dt><em>generic</em><dd>Produce code optimized for the most common IA32/AMD64/EM64T processors. 
If you know the CPU on which your code will run, then you should use
the corresponding <samp><span class="option">-mtune</span></samp> option instead of
<samp><span class="option">-mtune=generic</span></samp>.  But, if you do not know exactly what CPU users
of your application will have, then you should use this option.

          <p>As new processors are deployed in the marketplace, the behavior of this
option will change.  Therefore, if you upgrade to a newer version of
GCC, the code generated option will change to reflect the processors
that were most common when that version of GCC was released.

          <p>There is no <samp><span class="option">-march=generic</span></samp> option because <samp><span class="option">-march</span></samp>
indicates the instruction set the compiler can use, and there is no
generic instruction set applicable to all processors.  In contrast,
<samp><span class="option">-mtune</span></samp> indicates the processor (or, in this case, collection of
processors) for which the code is optimized. 
<br><dt><em>native</em><dd>This selects the CPU to tune for at compilation time by determining
the processor type of the compiling machine.  Using <samp><span class="option">-mtune=native</span></samp>
will produce code optimized for the local machine under the constraints
of the selected instruction set.  Using <samp><span class="option">-march=native</span></samp> will
enable all instruction subsets supported by the local machine (hence
the result might not run on different machines). 
<br><dt><em>i386</em><dd>Original Intel's i386 CPU. 
<br><dt><em>i486</em><dd>Intel's i486 CPU.  (No scheduling is implemented for this chip.) 
<br><dt><em>i586, pentium</em><dd>Intel Pentium CPU with no MMX support. 
<br><dt><em>pentium-mmx</em><dd>Intel PentiumMMX CPU based on Pentium core with MMX instruction set support. 
<br><dt><em>pentiumpro</em><dd>Intel PentiumPro CPU. 
<br><dt><em>i686</em><dd>Same as <code>generic</code>, but when used as <code>march</code> option, PentiumPro
instruction set will be used, so the code will run on all i686 family chips. 
<br><dt><em>pentium2</em><dd>Intel Pentium2 CPU based on PentiumPro core with MMX instruction set support. 
<br><dt><em>pentium3, pentium3m</em><dd>Intel Pentium3 CPU based on PentiumPro core with MMX and SSE instruction set
support. 
<br><dt><em>pentium-m</em><dd>Low power version of Intel Pentium3 CPU with MMX, SSE and SSE2 instruction set
support.  Used by Centrino notebooks. 
<br><dt><em>pentium4, pentium4m</em><dd>Intel Pentium4 CPU with MMX, SSE and SSE2 instruction set support. 
<br><dt><em>prescott</em><dd>Improved version of Intel Pentium4 CPU with MMX, SSE, SSE2 and SSE3 instruction
set support. 
<br><dt><em>nocona</em><dd>Improved version of Intel Pentium4 CPU with 64-bit extensions, MMX, SSE,
SSE2 and SSE3 instruction set support. 
<br><dt><em>core2</em><dd>Intel Core2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3
instruction set support. 
<br><dt><em>corei7</em><dd>Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1
and SSE4.2 instruction set support. 
<br><dt><em>corei7-avx</em><dd>Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
SSE4.1, SSE4.2, AVX, AES and PCLMUL instruction set support. 
<br><dt><em>core-avx-i</em><dd>Intel Core CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
SSE4.1, SSE4.2, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C instruction
set support. 
<br><dt><em>atom</em><dd>Intel Atom CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3
instruction set support. 
<br><dt><em>k6</em><dd>AMD K6 CPU with MMX instruction set support. 
<br><dt><em>k6-2, k6-3</em><dd>Improved versions of AMD K6 CPU with MMX and 3DNow! instruction set support. 
<br><dt><em>athlon, athlon-tbird</em><dd>AMD Athlon CPU with MMX, 3dNOW!, enhanced 3DNow! and SSE prefetch instructions
support. 
<br><dt><em>athlon-4, athlon-xp, athlon-mp</em><dd>Improved AMD Athlon CPU with MMX, 3DNow!, enhanced 3DNow! and full SSE
instruction set support. 
<br><dt><em>k8, opteron, athlon64, athlon-fx</em><dd>AMD K8 core based CPUs with x86-64 instruction set support.  (This supersets
MMX, SSE, SSE2, 3DNow!, enhanced 3DNow! and 64-bit instruction set extensions.) 
<br><dt><em>k8-sse3, opteron-sse3, athlon64-sse3</em><dd>Improved versions of k8, opteron and athlon64 with SSE3 instruction set support. 
<br><dt><em>amdfam10, barcelona</em><dd>AMD Family 10h core based CPUs with x86-64 instruction set support.  (This
supersets MMX, SSE, SSE2, SSE3, SSE4A, 3DNow!, enhanced 3DNow!, ABM and 64-bit
instruction set extensions.) 
<br><dt><em>bdver1</em><dd>AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
supersets FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A,
SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.) 
<br><dt><em>bdver2</em><dd>AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE,
SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set
extensions.) 
<br><dt><em>btver1</em><dd>AMD Family 14h core based CPUs with x86-64 instruction set support.  (This
supersets MMX, SSE, SSE2, SSE3, SSSE3, SSE4A, CX16, ABM and 64-bit
instruction set extensions.) 
<br><dt><em>winchip-c6</em><dd>IDT Winchip C6 CPU, dealt in same way as i486 with additional MMX instruction
set support. 
<br><dt><em>winchip2</em><dd>IDT Winchip2 CPU, dealt in same way as i486 with additional MMX and 3DNow! 
instruction set support. 
<br><dt><em>c3</em><dd>Via C3 CPU with MMX and 3DNow! instruction set support.  (No scheduling is
implemented for this chip.) 
<br><dt><em>c3-2</em><dd>Via C3-2 CPU with MMX and SSE instruction set support.  (No scheduling is
implemented for this chip.) 
<br><dt><em>geode</em><dd>Embedded AMD CPU with MMX and 3DNow! instruction set support. 
</dl>

     <p>While picking a specific <var>cpu-type</var> will schedule things appropriately
for that particular chip, the compiler will not generate any code that
does not run on the default machine type without the <samp><span class="option">-march=</span><var>cpu-type</var></samp>
option being used. For example, if GCC is configured for i686-pc-linux-gnu
then <samp><span class="option">-mtune=pentium4</span></samp> will generate code that is tuned for Pentium4
but will still run on i686 machines.

     <br><dt><code>-march=</code><var>cpu-type</var><dd><a name="index-march-1414"></a>Generate instructions for the machine type <var>cpu-type</var>.  The choices
for <var>cpu-type</var> are the same as for <samp><span class="option">-mtune</span></samp>.  Moreover,
specifying <samp><span class="option">-march=</span><var>cpu-type</var></samp> implies <samp><span class="option">-mtune=</span><var>cpu-type</var></samp>.

     <br><dt><code>-mcpu=</code><var>cpu-type</var><dd><a name="index-mcpu-1415"></a>A deprecated synonym for <samp><span class="option">-mtune</span></samp>.

     <br><dt><code>-mfpmath=</code><var>unit</var><dd><a name="index-mfpmath-1416"></a>Generate floating-point arithmetic for selected unit <var>unit</var>.  The choices
for <var>unit</var> are:

          <dl>
<dt>&lsquo;<samp><span class="samp">387</span></samp>&rsquo;<dd>Use the standard 387 floating-point coprocessor present on the majority of chips and
emulated otherwise.  Code compiled with this option runs almost everywhere. 
The temporary results are computed in 80-bit precision instead of the precision
specified by the type, resulting in slightly different results compared to most
of other chips.  See <samp><span class="option">-ffloat-store</span></samp> for more detailed description.

          <p>This is the default choice for i386 compiler.

          <br><dt>&lsquo;<samp><span class="samp">sse</span></samp>&rsquo;<dd>Use scalar floating-point instructions present in the SSE instruction set. 
This instruction set is supported by Pentium3 and newer chips, in the AMD line
by Athlon-4, Athlon-xp and Athlon-mp chips.  The earlier version of SSE
instruction set supports only single-precision arithmetic, thus the double and
extended-precision arithmetic are still done using 387.  A later version, present
only in Pentium4 and the future AMD x86-64 chips, supports double-precision
arithmetic too.

          <p>For the i386 compiler, you need to use <samp><span class="option">-march=</span><var>cpu-type</var></samp>, <samp><span class="option">-msse</span></samp>
or <samp><span class="option">-msse2</span></samp> switches to enable SSE extensions and make this option
effective.  For the x86-64 compiler, these extensions are enabled by default.

          <p>The resulting code should be considerably faster in the majority of cases and avoid
the numerical instability problems of 387 code, but may break some existing
code that expects temporaries to be 80 bits.

          <p>This is the default choice for the x86-64 compiler.

          <br><dt>&lsquo;<samp><span class="samp">sse,387</span></samp>&rsquo;<dt>&lsquo;<samp><span class="samp">sse+387</span></samp>&rsquo;<dt>&lsquo;<samp><span class="samp">both</span></samp>&rsquo;<dd>Attempt to utilize both instruction sets at once.  This effectively double the
amount of available registers and on chips with separate execution units for
387 and SSE the execution resources too.  Use this option with care, as it is
still experimental, because the GCC register allocator does not model separate
functional units well resulting in instable performance. 
</dl>

     <br><dt><code>-masm=</code><var>dialect</var><dd><a name="index-masm_003d_0040var_007bdialect_007d-1417"></a>Output asm instructions using selected <var>dialect</var>.  Supported
choices are &lsquo;<samp><span class="samp">intel</span></samp>&rsquo; or &lsquo;<samp><span class="samp">att</span></samp>&rsquo; (the default one).  Darwin does
not support &lsquo;<samp><span class="samp">intel</span></samp>&rsquo;.

     <br><dt><code>-mieee-fp</code><dt><code>-mno-ieee-fp</code><dd><a name="index-mieee_002dfp-1418"></a><a name="index-mno_002dieee_002dfp-1419"></a>Control whether or not the compiler uses IEEE floating-point
comparisons.  These handle correctly the case where the result of a
comparison is unordered.

     <br><dt><code>-msoft-float</code><dd><a name="index-msoft_002dfloat-1420"></a>Generate output containing library calls for floating point. 
<strong>Warning:</strong> the requisite libraries are not part of GCC. 
Normally the facilities of the machine's usual C compiler are used, but
this can't be done directly in cross-compilation.  You must make your
own arrangements to provide suitable library functions for
cross-compilation.

     <p>On machines where a function returns floating-point results in the 80387
register stack, some floating-point opcodes may be emitted even if
<samp><span class="option">-msoft-float</span></samp> is used.

     <br><dt><code>-mno-fp-ret-in-387</code><dd><a name="index-mno_002dfp_002dret_002din_002d387-1421"></a>Do not use the FPU registers for return values of functions.

     <p>The usual calling convention has functions return values of types
<code>float</code> and <code>double</code> in an FPU register, even if there
is no FPU.  The idea is that the operating system should emulate
an FPU.

     <p>The option <samp><span class="option">-mno-fp-ret-in-387</span></samp> causes such values to be returned
in ordinary CPU registers instead.

     <br><dt><code>-mno-fancy-math-387</code><dd><a name="index-mno_002dfancy_002dmath_002d387-1422"></a>Some 387 emulators do not support the <code>sin</code>, <code>cos</code> and
<code>sqrt</code> instructions for the 387.  Specify this option to avoid
generating those instructions.  This option is the default on FreeBSD,
OpenBSD and NetBSD.  This option is overridden when <samp><span class="option">-march</span></samp>
indicates that the target CPU will always have an FPU and so the
instruction will not need emulation.  As of revision 2.6.1, these
instructions are not generated unless you also use the
<samp><span class="option">-funsafe-math-optimizations</span></samp> switch.

     <br><dt><code>-malign-double</code><dt><code>-mno-align-double</code><dd><a name="index-malign_002ddouble-1423"></a><a name="index-mno_002dalign_002ddouble-1424"></a>Control whether GCC aligns <code>double</code>, <code>long double</code>, and
<code>long long</code> variables on a two-word boundary or a one-word
boundary.  Aligning <code>double</code> variables on a two-word boundary
produces code that runs somewhat faster on a &lsquo;<samp><span class="samp">Pentium</span></samp>&rsquo; at the
expense of more memory.

     <p>On x86-64, <samp><span class="option">-malign-double</span></samp> is enabled by default.

     <p><strong>Warning:</strong> if you use the <samp><span class="option">-malign-double</span></samp> switch,
structures containing the above types will be aligned differently than
the published application binary interface specifications for the 386
and will not be binary compatible with structures in code compiled
without that switch.

     <br><dt><code>-m96bit-long-double</code><dt><code>-m128bit-long-double</code><dd><a name="index-m96bit_002dlong_002ddouble-1425"></a><a name="index-m128bit_002dlong_002ddouble-1426"></a>These switches control the size of <code>long double</code> type.  The i386
application binary interface specifies the size to be 96 bits,
so <samp><span class="option">-m96bit-long-double</span></samp> is the default in 32-bit mode.

     <p>Modern architectures (Pentium and newer) prefer <code>long double</code>
to be aligned to an 8- or 16-byte boundary.  In arrays or structures
conforming to the ABI, this is not possible.  So specifying
<samp><span class="option">-m128bit-long-double</span></samp> aligns <code>long double</code>
to a 16-byte boundary by padding the <code>long double</code> with an additional
32-bit zero.

     <p>In the x86-64 compiler, <samp><span class="option">-m128bit-long-double</span></samp> is the default choice as
its ABI specifies that <code>long double</code> is to be aligned on 16-byte boundary.

     <p>Notice that neither of these options enable any extra precision over the x87
standard of 80 bits for a <code>long double</code>.

     <p><strong>Warning:</strong> if you override the default value for your target ABI, the
structures and arrays containing <code>long double</code> variables will change
their size as well as function calling convention for function taking
<code>long double</code> will be modified.  Hence they will not be binary
compatible with arrays or structures in code compiled without that switch.

     <br><dt><code>-mlarge-data-threshold=</code><var>number</var><dd><a name="index-mlarge_002ddata_002dthreshold_003d_0040var_007bnumber_007d-1427"></a>When <samp><span class="option">-mcmodel=medium</span></samp> is specified, the data greater than
<var>threshold</var> are placed in large data section.  This value must be the
same across all object linked into the binary and defaults to 65535.

     <br><dt><code>-mrtd</code><dd><a name="index-mrtd-1428"></a>Use a different function-calling convention, in which functions that
take a fixed number of arguments return with the <code>ret</code> <var>num</var>
instruction, which pops their arguments while returning.  This saves one
instruction in the caller since there is no need to pop the arguments
there.

     <p>You can specify that an individual function is called with this calling
sequence with the function attribute &lsquo;<samp><span class="samp">stdcall</span></samp>&rsquo;.  You can also
override the <samp><span class="option">-mrtd</span></samp> option by using the function attribute
&lsquo;<samp><span class="samp">cdecl</span></samp>&rsquo;.  See <a href="Function-Attributes.html#Function-Attributes">Function Attributes</a>.

     <p><strong>Warning:</strong> this calling convention is incompatible with the one
normally used on Unix, so you cannot use it if you need to call
libraries compiled with the Unix compiler.

     <p>Also, you must provide function prototypes for all functions that
take variable numbers of arguments (including <code>printf</code>);
otherwise incorrect code will be generated for calls to those
functions.

     <p>In addition, seriously incorrect code will result if you call a
function with too many arguments.  (Normally, extra arguments are
harmlessly ignored.)

     <br><dt><code>-mregparm=</code><var>num</var><dd><a name="index-mregparm-1429"></a>Control how many registers are used to pass integer arguments.  By
default, no registers are used to pass arguments, and at most 3
registers can be used.  You can control this behavior for a specific
function by using the function attribute &lsquo;<samp><span class="samp">regparm</span></samp>&rsquo;. 
See <a href="Function-Attributes.html#Function-Attributes">Function Attributes</a>.

     <p><strong>Warning:</strong> if you use this switch, and
<var>num</var> is nonzero, then you must build all modules with the same
value, including any libraries.  This includes the system libraries and
startup modules.

     <br><dt><code>-msseregparm</code><dd><a name="index-msseregparm-1430"></a>Use SSE register passing conventions for float and double arguments
and return values.  You can control this behavior for a specific
function by using the function attribute &lsquo;<samp><span class="samp">sseregparm</span></samp>&rsquo;. 
See <a href="Function-Attributes.html#Function-Attributes">Function Attributes</a>.

     <p><strong>Warning:</strong> if you use this switch then you must build all
modules with the same value, including any libraries.  This includes
the system libraries and startup modules.

     <br><dt><code>-mvect8-ret-in-mem</code><dd><a name="index-mvect8_002dret_002din_002dmem-1431"></a>Return 8-byte vectors in memory instead of MMX registers.  This is the
default on Solaris&nbsp;8 and 9 and VxWorks to match the ABI of the Sun
Studio compilers until version 12.  Later compiler versions (starting
with Studio 12 Update&nbsp;1) follow the ABI used by other x86 targets, which
is the default on Solaris&nbsp;10 and later.  <em>Only</em> use this option if
you need to remain compatible with existing code produced by those
previous compiler versions or older versions of GCC.

     <br><dt><code>-mpc32</code><dt><code>-mpc64</code><dt><code>-mpc80</code><dd><a name="index-mpc32-1432"></a><a name="index-mpc64-1433"></a><a name="index-mpc80-1434"></a>
Set 80387 floating-point precision to 32, 64 or 80 bits.  When <samp><span class="option">-mpc32</span></samp>
is specified, the significands of results of floating-point operations are
rounded to 24 bits (single precision); <samp><span class="option">-mpc64</span></samp> rounds the
significands of results of floating-point operations to 53 bits (double
precision) and <samp><span class="option">-mpc80</span></samp> rounds the significands of results of
floating-point operations to 64 bits (extended double precision), which is
the default.  When this option is used, floating-point operations in higher
precisions are not available to the programmer without setting the FPU
control word explicitly.

     <p>Setting the rounding of floating-point operations to less than the default
80 bits can speed some programs by 2% or more.  Note that some mathematical
libraries assume that extended-precision (80-bit) floating-point operations
are enabled by default; routines in such libraries could suffer significant
loss of accuracy, typically through so-called "catastrophic cancellation",
when this option is used to set the precision to less than extended precision.

     <br><dt><code>-mstackrealign</code><dd><a name="index-mstackrealign-1435"></a>Realign the stack at entry.  On the Intel x86, the <samp><span class="option">-mstackrealign</span></samp>
option will generate an alternate prologue and epilogue that realigns the
run-time stack if necessary.  This supports mixing legacy codes that keep
a 4-byte aligned stack with modern codes that keep a 16-byte stack for
SSE compatibility.  See also the attribute <code>force_align_arg_pointer</code>,
applicable to individual functions.

     <br><dt><code>-mpreferred-stack-boundary=</code><var>num</var><dd><a name="index-mpreferred_002dstack_002dboundary-1436"></a>Attempt to keep the stack boundary aligned to a 2 raised to <var>num</var>
byte boundary.  If <samp><span class="option">-mpreferred-stack-boundary</span></samp> is not specified,
the default is 4 (16 bytes or 128 bits).

     <br><dt><code>-mincoming-stack-boundary=</code><var>num</var><dd><a name="index-mincoming_002dstack_002dboundary-1437"></a>Assume the incoming stack is aligned to a 2 raised to <var>num</var> byte
boundary.  If <samp><span class="option">-mincoming-stack-boundary</span></samp> is not specified,
the one specified by <samp><span class="option">-mpreferred-stack-boundary</span></samp> will be used.

     <p>On Pentium and PentiumPro, <code>double</code> and <code>long double</code> values
should be aligned to an 8-byte boundary (see <samp><span class="option">-malign-double</span></samp>) or
suffer significant run time performance penalties.  On Pentium III, the
Streaming SIMD Extension (SSE) data type <code>__m128</code> may not work
properly if it is not 16-byte aligned.

     <p>To ensure proper alignment of this values on the stack, the stack boundary
must be as aligned as that required by any value stored on the stack. 
Further, every function must be generated such that it keeps the stack
aligned.  Thus calling a function compiled with a higher preferred
stack boundary from a function compiled with a lower preferred stack
boundary will most likely misalign the stack.  It is recommended that
libraries that use callbacks always use the default setting.

     <p>This extra alignment does consume extra stack space, and generally
increases code size.  Code that is sensitive to stack space usage, such
as embedded systems and operating system kernels, may want to reduce the
preferred alignment to <samp><span class="option">-mpreferred-stack-boundary=2</span></samp>.

     <br><dt><code>-mmmx</code><dt><code>-mno-mmx</code><dt><code>-msse</code><dt><code>-mno-sse</code><dt><code>-msse2</code><dt><code>-mno-sse2</code><dt><code>-msse3</code><dt><code>-mno-sse3</code><dt><code>-mssse3</code><dt><code>-mno-ssse3</code><dt><code>-msse4.1</code><dt><code>-mno-sse4.1</code><dt><code>-msse4.2</code><dt><code>-mno-sse4.2</code><dt><code>-msse4</code><dt><code>-mno-sse4</code><dt><code>-mavx</code><dt><code>-mno-avx</code><dt><code>-mavx2</code><dt><code>-mno-avx2</code><dt><code>-maes</code><dt><code>-mno-aes</code><dt><code>-mpclmul</code><dt><code>-mno-pclmul</code><dt><code>-mfsgsbase</code><dt><code>-mno-fsgsbase</code><dt><code>-mrdrnd</code><dt><code>-mno-rdrnd</code><dt><code>-mf16c</code><dt><code>-mno-f16c</code><dt><code>-mfma</code><dt><code>-mno-fma</code><dt><code>-msse4a</code><dt><code>-mno-sse4a</code><dt><code>-mfma4</code><dt><code>-mno-fma4</code><dt><code>-mxop</code><dt><code>-mno-xop</code><dt><code>-mlwp</code><dt><code>-mno-lwp</code><dt><code>-m3dnow</code><dt><code>-mno-3dnow</code><dt><code>-mpopcnt</code><dt><code>-mno-popcnt</code><dt><code>-mabm</code><dt><code>-mno-abm</code><dt><code>-mbmi</code><dt><code>-mbmi2</code><dt><code>-mno-bmi</code><dt><code>-mno-bmi2</code><dt><code>-mlzcnt</code><dt><code>-mno-lzcnt</code><dt><code>-mtbm</code><dt><code>-mno-tbm</code><dd><a name="index-mmmx-1438"></a><a name="index-mno_002dmmx-1439"></a><a name="index-msse-1440"></a><a name="index-mno_002dsse-1441"></a><a name="index-m3dnow-1442"></a><a name="index-mno_002d3dnow-1443"></a>These switches enable or disable the use of instructions in the MMX, SSE,
SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, F16C,
FMA, SSE4A, FMA4, XOP, LWP, ABM, BMI, BMI2, LZCNT or 3DNow! 
 extended instruction sets. 
These extensions are also available as built-in functions: see
<a href="X86-Built_002din-Functions.html#X86-Built_002din-Functions">X86 Built-in Functions</a>, for details of the functions enabled and
disabled by these switches.

     <p>To have SSE/SSE2 instructions generated automatically from floating-point
code (as opposed to 387 instructions), see <samp><span class="option">-mfpmath=sse</span></samp>.

     <p>GCC depresses SSEx instructions when <samp><span class="option">-mavx</span></samp> is used. Instead, it
generates new AVX instructions or AVX equivalence for all SSEx instructions
when needed.

     <p>These options will enable GCC to use these extended instructions in
generated code, even without <samp><span class="option">-mfpmath=sse</span></samp>.  Applications that
perform run-time CPU detection must compile separate files for each
supported architecture, using the appropriate flags.  In particular,
the file containing the CPU detection code should be compiled without
these options.

     <br><dt><code>-mcld</code><dd><a name="index-mcld-1444"></a>This option instructs GCC to emit a <code>cld</code> instruction in the prologue
of functions that use string instructions.  String instructions depend on
the DF flag to select between autoincrement or autodecrement mode.  While the
ABI specifies the DF flag to be cleared on function entry, some operating
systems violate this specification by not clearing the DF flag in their
exception dispatchers.  The exception handler can be invoked with the DF flag
set, which leads to wrong direction mode when string instructions are used. 
This option can be enabled by default on 32-bit x86 targets by configuring
GCC with the <samp><span class="option">--enable-cld</span></samp> configure option.  Generation of <code>cld</code>
instructions can be suppressed with the <samp><span class="option">-mno-cld</span></samp> compiler option
in this case.

     <br><dt><code>-mvzeroupper</code><dd><a name="index-mvzeroupper-1445"></a>This option instructs GCC to emit a <code>vzeroupper</code> instruction
before a transfer of control flow out of the function to minimize
AVX to SSE transition penalty as well as remove unnecessary zeroupper
intrinsics.

     <br><dt><code>-mcx16</code><dd><a name="index-mcx16-1446"></a>This option will enable GCC to use CMPXCHG16B instruction in generated code. 
CMPXCHG16B allows for atomic operations on 128-bit double quadword (or oword)
data types.  This is useful for high resolution counters that could be updated
by multiple processors (or cores).  This instruction is generated as part of
atomic built-in functions: see <a href="_005f_005fsync-Builtins.html#g_t_005f_005fsync-Builtins">__sync Builtins</a> or
<a href="_005f_005fatomic-Builtins.html#g_t_005f_005fatomic-Builtins">__atomic Builtins</a> for details.

     <br><dt><code>-msahf</code><dd><a name="index-msahf-1447"></a>This option will enable GCC to use SAHF instruction in generated 64-bit code. 
Early Intel CPUs with Intel 64 lacked LAHF and SAHF instructions supported
by AMD64 until introduction of Pentium 4 G1 step in December 2005.  LAHF and
SAHF are load and store instructions, respectively, for certain status flags. 
In 64-bit mode, SAHF instruction is used to optimize <code>fmod</code>, <code>drem</code>
or <code>remainder</code> built-in functions: see <a href="Other-Builtins.html#Other-Builtins">Other Builtins</a> for details.

     <br><dt><code>-mmovbe</code><dd><a name="index-mmovbe-1448"></a>This option will enable GCC to use movbe instruction to implement
<code>__builtin_bswap32</code> and <code>__builtin_bswap64</code>.

     <br><dt><code>-mcrc32</code><dd><a name="index-mcrc32-1449"></a>This option will enable built-in functions, <code>__builtin_ia32_crc32qi</code>,
<code>__builtin_ia32_crc32hi</code>. <code>__builtin_ia32_crc32si</code> and
<code>__builtin_ia32_crc32di</code> to generate the crc32 machine instruction.

     <br><dt><code>-mrecip</code><dd><a name="index-mrecip-1450"></a>This option will enable GCC to use RCPSS and RSQRTSS instructions (and their
vectorized variants RCPPS and RSQRTPS) with an additional Newton-Raphson step
to increase precision instead of DIVSS and SQRTSS (and their vectorized
variants) for single-precision floating-point arguments.  These instructions
are generated only when <samp><span class="option">-funsafe-math-optimizations</span></samp> is enabled
together with <samp><span class="option">-finite-math-only</span></samp> and <samp><span class="option">-fno-trapping-math</span></samp>. 
Note that while the throughput of the sequence is higher than the throughput
of the non-reciprocal instruction, the precision of the sequence can be
decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994).

     <p>Note that GCC implements <code>1.0f/sqrtf(</code><var>x</var><code>)</code> in terms of RSQRTSS
(or RSQRTPS) already with <samp><span class="option">-ffast-math</span></samp> (or the above option
combination), and doesn't need <samp><span class="option">-mrecip</span></samp>.

     <p>Also note that GCC emits the above sequence with additional Newton-Raphson step
for vectorized single-float division and vectorized <code>sqrtf(</code><var>x</var><code>)</code>
already with <samp><span class="option">-ffast-math</span></samp> (or the above option combination), and
doesn't need <samp><span class="option">-mrecip</span></samp>.

     <br><dt><code>-mrecip=</code><var>opt</var><dd><a name="index-mrecip_003dopt-1451"></a>This option allows to control which reciprocal estimate instructions
may be used.  <var>opt</var> is a comma separated list of options, which may
be preceded by a <code>!</code> to invert the option:
<code>all</code>: enable all estimate instructions,
<code>default</code>: enable the default instructions, equivalent to <samp><span class="option">-mrecip</span></samp>,
<code>none</code>: disable all estimate instructions, equivalent to <samp><span class="option">-mno-recip</span></samp>,
<code>div</code>: enable the approximation for scalar division,
<code>vec-div</code>: enable the approximation for vectorized division,
<code>sqrt</code>: enable the approximation for scalar square root,
<code>vec-sqrt</code>: enable the approximation for vectorized square root.

     <p>So for example, <samp><span class="option">-mrecip=all,!sqrt</span></samp> would enable
all of the reciprocal approximations, except for square root.

     <br><dt><code>-mveclibabi=</code><var>type</var><dd><a name="index-mveclibabi-1452"></a>Specifies the ABI type to use for vectorizing intrinsics using an
external library.  Supported types are <code>svml</code> for the Intel short
vector math library and <code>acml</code> for the AMD math core library style
of interfacing.  GCC will currently emit calls to <code>vmldExp2</code>,
<code>vmldLn2</code>, <code>vmldLog102</code>, <code>vmldLog102</code>, <code>vmldPow2</code>,
<code>vmldTanh2</code>, <code>vmldTan2</code>, <code>vmldAtan2</code>, <code>vmldAtanh2</code>,
<code>vmldCbrt2</code>, <code>vmldSinh2</code>, <code>vmldSin2</code>, <code>vmldAsinh2</code>,
<code>vmldAsin2</code>, <code>vmldCosh2</code>, <code>vmldCos2</code>, <code>vmldAcosh2</code>,
<code>vmldAcos2</code>, <code>vmlsExp4</code>, <code>vmlsLn4</code>, <code>vmlsLog104</code>,
<code>vmlsLog104</code>, <code>vmlsPow4</code>, <code>vmlsTanh4</code>, <code>vmlsTan4</code>,
<code>vmlsAtan4</code>, <code>vmlsAtanh4</code>, <code>vmlsCbrt4</code>, <code>vmlsSinh4</code>,
<code>vmlsSin4</code>, <code>vmlsAsinh4</code>, <code>vmlsAsin4</code>, <code>vmlsCosh4</code>,
<code>vmlsCos4</code>, <code>vmlsAcosh4</code> and <code>vmlsAcos4</code> for corresponding
function type when <samp><span class="option">-mveclibabi=svml</span></samp> is used and <code>__vrd2_sin</code>,
<code>__vrd2_cos</code>, <code>__vrd2_exp</code>, <code>__vrd2_log</code>, <code>__vrd2_log2</code>,
<code>__vrd2_log10</code>, <code>__vrs4_sinf</code>, <code>__vrs4_cosf</code>,
<code>__vrs4_expf</code>, <code>__vrs4_logf</code>, <code>__vrs4_log2f</code>,
<code>__vrs4_log10f</code> and <code>__vrs4_powf</code> for corresponding function type
when <samp><span class="option">-mveclibabi=acml</span></samp> is used. Both <samp><span class="option">-ftree-vectorize</span></samp> and
<samp><span class="option">-funsafe-math-optimizations</span></samp> have to be enabled. A SVML or ACML ABI
compatible library will have to be specified at link time.

     <br><dt><code>-mabi=</code><var>name</var><dd><a name="index-mabi-1453"></a>Generate code for the specified calling convention.  Permissible values
are: &lsquo;<samp><span class="samp">sysv</span></samp>&rsquo; for the ABI used on GNU/Linux and other systems and
&lsquo;<samp><span class="samp">ms</span></samp>&rsquo; for the Microsoft ABI.  The default is to use the Microsoft
ABI when targeting Windows.  On all other systems, the default is the
SYSV ABI.  You can control this behavior for a specific function by
using the function attribute &lsquo;<samp><span class="samp">ms_abi</span></samp>&rsquo;/&lsquo;<samp><span class="samp">sysv_abi</span></samp>&rsquo;. 
See <a href="Function-Attributes.html#Function-Attributes">Function Attributes</a>.

     <br><dt><code>-mtls-dialect=</code><var>type</var><dd><a name="index-mtls_002ddialect-1454"></a>Generate code to access thread-local storage using the &lsquo;<samp><span class="samp">gnu</span></samp>&rsquo; or
&lsquo;<samp><span class="samp">gnu2</span></samp>&rsquo; conventions.  &lsquo;<samp><span class="samp">gnu</span></samp>&rsquo; is the conservative default;
&lsquo;<samp><span class="samp">gnu2</span></samp>&rsquo; is more efficient, but it may add compile- and run-time
requirements that cannot be satisfied on all systems.

     <br><dt><code>-mpush-args</code><dt><code>-mno-push-args</code><dd><a name="index-mpush_002dargs-1455"></a><a name="index-mno_002dpush_002dargs-1456"></a>Use PUSH operations to store outgoing parameters.  This method is shorter
and usually equally fast as method using SUB/MOV operations and is enabled
by default.  In some cases disabling it may improve performance because of
improved scheduling and reduced dependencies.

     <br><dt><code>-maccumulate-outgoing-args</code><dd><a name="index-maccumulate_002doutgoing_002dargs-1457"></a>If enabled, the maximum amount of space required for outgoing arguments will be
computed in the function prologue.  This is faster on most modern CPUs
because of reduced dependencies, improved scheduling and reduced stack usage
when preferred stack boundary is not equal to 2.  The drawback is a notable
increase in code size.  This switch implies <samp><span class="option">-mno-push-args</span></samp>.

     <br><dt><code>-mthreads</code><dd><a name="index-mthreads-1458"></a>Support thread-safe exception handling on &lsquo;<samp><span class="samp">Mingw32</span></samp>&rsquo;.  Code that relies
on thread-safe exception handling must compile and link all code with the
<samp><span class="option">-mthreads</span></samp> option.  When compiling, <samp><span class="option">-mthreads</span></samp> defines
<samp><span class="option">-D_MT</span></samp>; when linking, it links in a special thread helper library
<samp><span class="option">-lmingwthrd</span></samp> which cleans up per thread exception handling data.

     <br><dt><code>-mno-align-stringops</code><dd><a name="index-mno_002dalign_002dstringops-1459"></a>Do not align destination of inlined string operations.  This switch reduces
code size and improves performance in case the destination is already aligned,
but GCC doesn't know about it.

     <br><dt><code>-minline-all-stringops</code><dd><a name="index-minline_002dall_002dstringops-1460"></a>By default GCC inlines string operations only when the destination is
known to be aligned to least a 4-byte boundary. 
This enables more inlining, increase code
size, but may improve performance of code that depends on fast memcpy, strlen
and memset for short lengths.

     <br><dt><code>-minline-stringops-dynamically</code><dd><a name="index-minline_002dstringops_002ddynamically-1461"></a>For string operations of unknown size, use run-time checks with
inline code for small blocks and a library call for large blocks.

     <br><dt><code>-mstringop-strategy=</code><var>alg</var><dd><a name="index-mstringop_002dstrategy_003d_0040var_007balg_007d-1462"></a>Overwrite internal decision heuristic about particular algorithm to inline
string operation with.  The allowed values are <code>rep_byte</code>,
<code>rep_4byte</code>, <code>rep_8byte</code> for expanding using i386 <code>rep</code> prefix
of specified size, <code>byte_loop</code>, <code>loop</code>, <code>unrolled_loop</code> for
expanding inline loop, <code>libcall</code> for always expanding library call.

     <br><dt><code>-momit-leaf-frame-pointer</code><dd><a name="index-momit_002dleaf_002dframe_002dpointer-1463"></a>Don't keep the frame pointer in a register for leaf functions.  This
avoids the instructions to save, set up and restore frame pointers and
makes an extra register available in leaf functions.  The option
<samp><span class="option">-fomit-frame-pointer</span></samp> removes the frame pointer for all functions,
which might make debugging harder.

     <br><dt><code>-mtls-direct-seg-refs</code><dt><code>-mno-tls-direct-seg-refs</code><dd><a name="index-mtls_002ddirect_002dseg_002drefs-1464"></a>Controls whether TLS variables may be accessed with offsets from the
TLS segment register (<code>%gs</code> for 32-bit, <code>%fs</code> for 64-bit),
or whether the thread base pointer must be added.  Whether or not this
is legal depends on the operating system, and whether it maps the
segment to cover the entire TLS area.

     <p>For systems that use GNU libc, the default is on.

     <br><dt><code>-msse2avx</code><dt><code>-mno-sse2avx</code><dd><a name="index-msse2avx-1465"></a>Specify that the assembler should encode SSE instructions with VEX
prefix.  The option <samp><span class="option">-mavx</span></samp> turns this on by default.

     <br><dt><code>-mfentry</code><dt><code>-mno-fentry</code><dd><a name="index-mfentry-1466"></a>If profiling is active <samp><span class="option">-pg</span></samp> put the profiling
counter call before prologue. 
Note: On x86 architectures the attribute <code>ms_hook_prologue</code>
isn't possible at the moment for <samp><span class="option">-mfentry</span></samp> and <samp><span class="option">-pg</span></samp>.

     <br><dt><code>-m8bit-idiv</code><dt><code>-mno-8bit-idiv</code><dd><a name="index-g_t8bit_002didiv-1467"></a>On some processors, like Intel Atom, 8-bit unsigned integer divide is
much faster than 32-bit/64-bit integer divide.  This option generates a
run-time check.  If both dividend and divisor are within range of 0
to 255, 8-bit unsigned integer divide is used instead of
32-bit/64-bit integer divide.

     <br><dt><code>-mavx256-split-unaligned-load</code><br><dt><code>-mavx256-split-unaligned-store</code><dd><a name="index-avx256_002dsplit_002dunaligned_002dload-1468"></a><a name="index-avx256_002dsplit_002dunaligned_002dstore-1469"></a>Split 32-byte AVX unaligned load and store.

 </dl>

 <p>These &lsquo;<samp><span class="samp">-m</span></samp>&rsquo; switches are supported in addition to the above
on AMD x86-64 processors in 64-bit environments.

     <dl>
<dt><code>-m32</code><dt><code>-m64</code><dt><code>-mx32</code><dd><a name="index-m32-1470"></a><a name="index-m64-1471"></a><a name="index-mx32-1472"></a>Generate code for a 32-bit or 64-bit environment. 
The <samp><span class="option">-m32</span></samp> option sets int, long and pointer to 32 bits and
generates code that runs on any i386 system. 
The <samp><span class="option">-m64</span></samp> option sets int to 32 bits and long and pointer
to 64 bits and generates code for AMD's x86-64 architecture. 
The <samp><span class="option">-mx32</span></samp> option sets int, long and pointer to 32 bits and
generates code for AMD's x86-64 architecture. 
For darwin only the <samp><span class="option">-m64</span></samp> option turns off the <samp><span class="option">-fno-pic</span></samp>
and <samp><span class="option">-mdynamic-no-pic</span></samp> options.

     <br><dt><code>-mno-red-zone</code><dd><a name="index-mno_002dred_002dzone-1473"></a>Do not use a so called red zone for x86-64 code.  The red zone is mandated
by the x86-64 ABI, it is a 128-byte area beyond the location of the
stack pointer that will not be modified by signal or interrupt handlers
and therefore can be used for temporary data without adjusting the stack
pointer.  The flag <samp><span class="option">-mno-red-zone</span></samp> disables this red zone.

     <br><dt><code>-mcmodel=small</code><dd><a name="index-mcmodel_003dsmall-1474"></a>Generate code for the small code model: the program and its symbols must
be linked in the lower 2 GB of the address space.  Pointers are 64 bits. 
Programs can be statically or dynamically linked.  This is the default
code model.

     <br><dt><code>-mcmodel=kernel</code><dd><a name="index-mcmodel_003dkernel-1475"></a>Generate code for the kernel code model.  The kernel runs in the
negative 2 GB of the address space. 
This model has to be used for Linux kernel code.

     <br><dt><code>-mcmodel=medium</code><dd><a name="index-mcmodel_003dmedium-1476"></a>Generate code for the medium model: The program is linked in the lower 2
GB of the address space.  Small symbols are also placed there.  Symbols
with sizes larger than <samp><span class="option">-mlarge-data-threshold</span></samp> are put into
large data or bss sections and can be located above 2GB.  Programs can
be statically or dynamically linked.

     <br><dt><code>-mcmodel=large</code><dd><a name="index-mcmodel_003dlarge-1477"></a>Generate code for the large model: This model makes no assumptions
about addresses and sizes of sections. 
</dl>

 </body></html>