Add "Project Denver" CPU support
Denver is NVIDIA's own custom-designed, 64-bit, dual-core CPU which is
fully ARMv8 architecture compatible. Each of the two Denver cores
implements a 7-way superscalar microarchitecture (up to 7 concurrent
micro-ops can be executed per clock), and includes a 128KB 4-way L1
instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2
cache, which services both cores.
Denver implements an innovative process called Dynamic Code Optimization,
which optimizes frequently used software routines at runtime into dense,
highly tuned microcode-equivalent routines. These are stored in a
dedicated, 128MB main-memory-based optimization cache. After being read
into the instruction cache, the optimized micro-ops are executed,
re-fetched and executed from the instruction cache as long as needed and
capacity allows.
Effectively, this reduces the need to re-optimize the software routines.
Instead of using hardware to extract the instruction-level parallelism
(ILP) inherent in the code, Denver extracts the ILP once via software
techniques, and then executes those routines repeatedly, thus amortizing
the cost of ILP extraction over the many execution instances.
Denver also features new low latency power-state transitions, in addition
to extensive power-gating and dynamic voltage and clock scaling based on
workloads.
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
2015-07-14 12:41:20 +01:00
|
|
|
/*
|
2018-01-11 01:03:22 +00:00
|
|
|
* Copyright (c) 2015-2018, ARM Limited and Contributors. All rights reserved.
|
2020-05-25 00:26:22 +01:00
|
|
|
* Copyright (c) 2020, NVIDIA Corporation. All rights reserved.
|
Add "Project Denver" CPU support
Denver is NVIDIA's own custom-designed, 64-bit, dual-core CPU which is
fully ARMv8 architecture compatible. Each of the two Denver cores
implements a 7-way superscalar microarchitecture (up to 7 concurrent
micro-ops can be executed per clock), and includes a 128KB 4-way L1
instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2
cache, which services both cores.
Denver implements an innovative process called Dynamic Code Optimization,
which optimizes frequently used software routines at runtime into dense,
highly tuned microcode-equivalent routines. These are stored in a
dedicated, 128MB main-memory-based optimization cache. After being read
into the instruction cache, the optimized micro-ops are executed,
re-fetched and executed from the instruction cache as long as needed and
capacity allows.
Effectively, this reduces the need to re-optimize the software routines.
Instead of using hardware to extract the instruction-level parallelism
(ILP) inherent in the code, Denver extracts the ILP once via software
techniques, and then executes those routines repeatedly, thus amortizing
the cost of ILP extraction over the many execution instances.
Denver also features new low latency power-state transitions, in addition
to extensive power-gating and dynamic voltage and clock scaling based on
workloads.
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
2015-07-14 12:41:20 +01:00
|
|
|
*
|
2017-05-03 09:38:09 +01:00
|
|
|
* SPDX-License-Identifier: BSD-3-Clause
|
Add "Project Denver" CPU support
Denver is NVIDIA's own custom-designed, 64-bit, dual-core CPU which is
fully ARMv8 architecture compatible. Each of the two Denver cores
implements a 7-way superscalar microarchitecture (up to 7 concurrent
micro-ops can be executed per clock), and includes a 128KB 4-way L1
instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2
cache, which services both cores.
Denver implements an innovative process called Dynamic Code Optimization,
which optimizes frequently used software routines at runtime into dense,
highly tuned microcode-equivalent routines. These are stored in a
dedicated, 128MB main-memory-based optimization cache. After being read
into the instruction cache, the optimized micro-ops are executed,
re-fetched and executed from the instruction cache as long as needed and
capacity allows.
Effectively, this reduces the need to re-optimize the software routines.
Instead of using hardware to extract the instruction-level parallelism
(ILP) inherent in the code, Denver extracts the ILP once via software
techniques, and then executes those routines repeatedly, thus amortizing
the cost of ILP extraction over the many execution instances.
Denver also features new low latency power-state transitions, in addition
to extensive power-gating and dynamic voltage and clock scaling based on
workloads.
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
2015-07-14 12:41:20 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
#include <arch.h>
|
|
|
|
#include <asm_macros.S>
|
|
|
|
#include <assert_macros.S>
|
2018-01-11 01:03:22 +00:00
|
|
|
#include <context.h>
|
Add "Project Denver" CPU support
Denver is NVIDIA's own custom-designed, 64-bit, dual-core CPU which is
fully ARMv8 architecture compatible. Each of the two Denver cores
implements a 7-way superscalar microarchitecture (up to 7 concurrent
micro-ops can be executed per clock), and includes a 128KB 4-way L1
instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2
cache, which services both cores.
Denver implements an innovative process called Dynamic Code Optimization,
which optimizes frequently used software routines at runtime into dense,
highly tuned microcode-equivalent routines. These are stored in a
dedicated, 128MB main-memory-based optimization cache. After being read
into the instruction cache, the optimized micro-ops are executed,
re-fetched and executed from the instruction cache as long as needed and
capacity allows.
Effectively, this reduces the need to re-optimize the software routines.
Instead of using hardware to extract the instruction-level parallelism
(ILP) inherent in the code, Denver extracts the ILP once via software
techniques, and then executes those routines repeatedly, thus amortizing
the cost of ILP extraction over the many execution instances.
Denver also features new low latency power-state transitions, in addition
to extensive power-gating and dynamic voltage and clock scaling based on
workloads.
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
2015-07-14 12:41:20 +01:00
|
|
|
#include <denver.h>
|
|
|
|
#include <cpu_macros.S>
|
|
|
|
#include <plat_macros.S>
|
|
|
|
|
2018-01-11 01:03:22 +00:00
|
|
|
/* -------------------------------------------------
|
|
|
|
* CVE-2017-5715 mitigation
|
|
|
|
*
|
|
|
|
* Flush the indirect branch predictor and RSB on
|
|
|
|
* entry to EL3 by issuing a newly added instruction
|
|
|
|
* for Denver CPUs.
|
|
|
|
*
|
|
|
|
* To achieve this without performing any branch
|
|
|
|
* instruction, a per-cpu vbar is installed which
|
|
|
|
* executes the workaround and then branches off to
|
|
|
|
* the corresponding vector entry in the main vector
|
|
|
|
* table.
|
|
|
|
* -------------------------------------------------
|
|
|
|
*/
|
|
|
|
vector_base workaround_bpflush_runtime_exceptions
|
|
|
|
|
|
|
|
.macro apply_workaround
|
|
|
|
stp x0, x1, [sp, #CTX_GPREGS_OFFSET + CTX_GPREG_X0]
|
|
|
|
|
2020-05-25 00:26:22 +01:00
|
|
|
/* Disable cycle counter when event counting is prohibited */
|
|
|
|
mrs x1, pmcr_el0
|
|
|
|
orr x0, x1, #PMCR_EL0_DP_BIT
|
|
|
|
msr pmcr_el0, x0
|
|
|
|
isb
|
|
|
|
|
2018-01-11 01:03:22 +00:00
|
|
|
/* -------------------------------------------------
|
|
|
|
* A new write-only system register where a write of
|
|
|
|
* 1 to bit 0 will cause the indirect branch predictor
|
|
|
|
* and RSB to be flushed.
|
|
|
|
*
|
|
|
|
* A write of 0 to bit 0 will be ignored. A write of
|
|
|
|
* 1 to any other bit will cause an MCA.
|
|
|
|
* -------------------------------------------------
|
|
|
|
*/
|
|
|
|
mov x0, #1
|
|
|
|
msr s3_0_c15_c0_6, x0
|
|
|
|
isb
|
|
|
|
|
|
|
|
ldp x0, x1, [sp, #CTX_GPREGS_OFFSET + CTX_GPREG_X0]
|
|
|
|
.endm
|
|
|
|
|
|
|
|
/* ---------------------------------------------------------------------
|
|
|
|
* Current EL with SP_EL0 : 0x0 - 0x200
|
|
|
|
* ---------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
vector_entry workaround_bpflush_sync_exception_sp_el0
|
|
|
|
b sync_exception_sp_el0
|
2018-04-17 11:31:43 +01:00
|
|
|
end_vector_entry workaround_bpflush_sync_exception_sp_el0
|
2018-01-11 01:03:22 +00:00
|
|
|
|
|
|
|
vector_entry workaround_bpflush_irq_sp_el0
|
|
|
|
b irq_sp_el0
|
2018-04-17 11:31:43 +01:00
|
|
|
end_vector_entry workaround_bpflush_irq_sp_el0
|
2018-01-11 01:03:22 +00:00
|
|
|
|
|
|
|
vector_entry workaround_bpflush_fiq_sp_el0
|
|
|
|
b fiq_sp_el0
|
2018-04-17 11:31:43 +01:00
|
|
|
end_vector_entry workaround_bpflush_fiq_sp_el0
|
2018-01-11 01:03:22 +00:00
|
|
|
|
|
|
|
vector_entry workaround_bpflush_serror_sp_el0
|
|
|
|
b serror_sp_el0
|
2018-04-17 11:31:43 +01:00
|
|
|
end_vector_entry workaround_bpflush_serror_sp_el0
|
2018-01-11 01:03:22 +00:00
|
|
|
|
|
|
|
/* ---------------------------------------------------------------------
|
|
|
|
* Current EL with SP_ELx: 0x200 - 0x400
|
|
|
|
* ---------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
vector_entry workaround_bpflush_sync_exception_sp_elx
|
|
|
|
b sync_exception_sp_elx
|
2018-04-17 11:31:43 +01:00
|
|
|
end_vector_entry workaround_bpflush_sync_exception_sp_elx
|
2018-01-11 01:03:22 +00:00
|
|
|
|
|
|
|
vector_entry workaround_bpflush_irq_sp_elx
|
|
|
|
b irq_sp_elx
|
2018-04-17 11:31:43 +01:00
|
|
|
end_vector_entry workaround_bpflush_irq_sp_elx
|
2018-01-11 01:03:22 +00:00
|
|
|
|
|
|
|
vector_entry workaround_bpflush_fiq_sp_elx
|
|
|
|
b fiq_sp_elx
|
2018-04-17 11:31:43 +01:00
|
|
|
end_vector_entry workaround_bpflush_fiq_sp_elx
|
2018-01-11 01:03:22 +00:00
|
|
|
|
|
|
|
vector_entry workaround_bpflush_serror_sp_elx
|
|
|
|
b serror_sp_elx
|
2018-04-17 11:31:43 +01:00
|
|
|
end_vector_entry workaround_bpflush_serror_sp_elx
|
2018-01-11 01:03:22 +00:00
|
|
|
|
|
|
|
/* ---------------------------------------------------------------------
|
|
|
|
* Lower EL using AArch64 : 0x400 - 0x600
|
|
|
|
* ---------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
vector_entry workaround_bpflush_sync_exception_aarch64
|
|
|
|
apply_workaround
|
|
|
|
b sync_exception_aarch64
|
2018-04-17 11:31:43 +01:00
|
|
|
end_vector_entry workaround_bpflush_sync_exception_aarch64
|
2018-01-11 01:03:22 +00:00
|
|
|
|
|
|
|
vector_entry workaround_bpflush_irq_aarch64
|
|
|
|
apply_workaround
|
|
|
|
b irq_aarch64
|
2018-04-17 11:31:43 +01:00
|
|
|
end_vector_entry workaround_bpflush_irq_aarch64
|
2018-01-11 01:03:22 +00:00
|
|
|
|
|
|
|
vector_entry workaround_bpflush_fiq_aarch64
|
|
|
|
apply_workaround
|
|
|
|
b fiq_aarch64
|
2018-04-17 11:31:43 +01:00
|
|
|
end_vector_entry workaround_bpflush_fiq_aarch64
|
2018-01-11 01:03:22 +00:00
|
|
|
|
|
|
|
vector_entry workaround_bpflush_serror_aarch64
|
|
|
|
apply_workaround
|
|
|
|
b serror_aarch64
|
2018-04-17 11:31:43 +01:00
|
|
|
end_vector_entry workaround_bpflush_serror_aarch64
|
2018-01-11 01:03:22 +00:00
|
|
|
|
|
|
|
/* ---------------------------------------------------------------------
|
|
|
|
* Lower EL using AArch32 : 0x600 - 0x800
|
|
|
|
* ---------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
vector_entry workaround_bpflush_sync_exception_aarch32
|
|
|
|
apply_workaround
|
|
|
|
b sync_exception_aarch32
|
2018-04-17 11:31:43 +01:00
|
|
|
end_vector_entry workaround_bpflush_sync_exception_aarch32
|
2018-01-11 01:03:22 +00:00
|
|
|
|
|
|
|
vector_entry workaround_bpflush_irq_aarch32
|
|
|
|
apply_workaround
|
|
|
|
b irq_aarch32
|
2018-04-17 11:31:43 +01:00
|
|
|
end_vector_entry workaround_bpflush_irq_aarch32
|
2018-01-11 01:03:22 +00:00
|
|
|
|
|
|
|
vector_entry workaround_bpflush_fiq_aarch32
|
|
|
|
apply_workaround
|
|
|
|
b fiq_aarch32
|
2018-04-17 11:31:43 +01:00
|
|
|
end_vector_entry workaround_bpflush_fiq_aarch32
|
2018-01-11 01:03:22 +00:00
|
|
|
|
|
|
|
vector_entry workaround_bpflush_serror_aarch32
|
|
|
|
apply_workaround
|
|
|
|
b serror_aarch32
|
2018-04-17 11:31:43 +01:00
|
|
|
end_vector_entry workaround_bpflush_serror_aarch32
|
2018-01-11 01:03:22 +00:00
|
|
|
|
2016-02-22 19:09:41 +00:00
|
|
|
.global denver_disable_dco
|
|
|
|
|
Add "Project Denver" CPU support
Denver is NVIDIA's own custom-designed, 64-bit, dual-core CPU which is
fully ARMv8 architecture compatible. Each of the two Denver cores
implements a 7-way superscalar microarchitecture (up to 7 concurrent
micro-ops can be executed per clock), and includes a 128KB 4-way L1
instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2
cache, which services both cores.
Denver implements an innovative process called Dynamic Code Optimization,
which optimizes frequently used software routines at runtime into dense,
highly tuned microcode-equivalent routines. These are stored in a
dedicated, 128MB main-memory-based optimization cache. After being read
into the instruction cache, the optimized micro-ops are executed,
re-fetched and executed from the instruction cache as long as needed and
capacity allows.
Effectively, this reduces the need to re-optimize the software routines.
Instead of using hardware to extract the instruction-level parallelism
(ILP) inherent in the code, Denver extracts the ILP once via software
techniques, and then executes those routines repeatedly, thus amortizing
the cost of ILP extraction over the many execution instances.
Denver also features new low latency power-state transitions, in addition
to extensive power-gating and dynamic voltage and clock scaling based on
workloads.
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
2015-07-14 12:41:20 +01:00
|
|
|
/* ---------------------------------------------
|
|
|
|
* Disable debug interfaces
|
|
|
|
* ---------------------------------------------
|
|
|
|
*/
|
|
|
|
func denver_disable_ext_debug
|
|
|
|
mov x0, #1
|
|
|
|
msr osdlr_el1, x0
|
|
|
|
isb
|
|
|
|
dsb sy
|
|
|
|
ret
|
|
|
|
endfunc denver_disable_ext_debug
|
|
|
|
|
|
|
|
/* ----------------------------------------------------
|
|
|
|
* Enable dynamic code optimizer (DCO)
|
|
|
|
* ----------------------------------------------------
|
|
|
|
*/
|
|
|
|
func denver_enable_dco
|
2020-08-06 07:10:40 +01:00
|
|
|
/* DCO is not supported on PN5 and later */
|
|
|
|
mrs x1, midr_el1
|
|
|
|
mov_imm x2, DENVER_MIDR_PN4
|
|
|
|
cmp x1, x2
|
|
|
|
b.hi 1f
|
|
|
|
|
2018-10-09 01:01:01 +01:00
|
|
|
mov x18, x30
|
2018-02-28 02:30:31 +00:00
|
|
|
bl plat_my_core_pos
|
Add "Project Denver" CPU support
Denver is NVIDIA's own custom-designed, 64-bit, dual-core CPU which is
fully ARMv8 architecture compatible. Each of the two Denver cores
implements a 7-way superscalar microarchitecture (up to 7 concurrent
micro-ops can be executed per clock), and includes a 128KB 4-way L1
instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2
cache, which services both cores.
Denver implements an innovative process called Dynamic Code Optimization,
which optimizes frequently used software routines at runtime into dense,
highly tuned microcode-equivalent routines. These are stored in a
dedicated, 128MB main-memory-based optimization cache. After being read
into the instruction cache, the optimized micro-ops are executed,
re-fetched and executed from the instruction cache as long as needed and
capacity allows.
Effectively, this reduces the need to re-optimize the software routines.
Instead of using hardware to extract the instruction-level parallelism
(ILP) inherent in the code, Denver extracts the ILP once via software
techniques, and then executes those routines repeatedly, thus amortizing
the cost of ILP extraction over the many execution instances.
Denver also features new low latency power-state transitions, in addition
to extensive power-gating and dynamic voltage and clock scaling based on
workloads.
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
2015-07-14 12:41:20 +01:00
|
|
|
mov x1, #1
|
|
|
|
lsl x1, x1, x0
|
|
|
|
msr s3_0_c15_c0_2, x1
|
2018-10-09 01:01:01 +01:00
|
|
|
mov x30, x18
|
2020-08-06 07:10:40 +01:00
|
|
|
1: ret
|
Add "Project Denver" CPU support
Denver is NVIDIA's own custom-designed, 64-bit, dual-core CPU which is
fully ARMv8 architecture compatible. Each of the two Denver cores
implements a 7-way superscalar microarchitecture (up to 7 concurrent
micro-ops can be executed per clock), and includes a 128KB 4-way L1
instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2
cache, which services both cores.
Denver implements an innovative process called Dynamic Code Optimization,
which optimizes frequently used software routines at runtime into dense,
highly tuned microcode-equivalent routines. These are stored in a
dedicated, 128MB main-memory-based optimization cache. After being read
into the instruction cache, the optimized micro-ops are executed,
re-fetched and executed from the instruction cache as long as needed and
capacity allows.
Effectively, this reduces the need to re-optimize the software routines.
Instead of using hardware to extract the instruction-level parallelism
(ILP) inherent in the code, Denver extracts the ILP once via software
techniques, and then executes those routines repeatedly, thus amortizing
the cost of ILP extraction over the many execution instances.
Denver also features new low latency power-state transitions, in addition
to extensive power-gating and dynamic voltage and clock scaling based on
workloads.
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
2015-07-14 12:41:20 +01:00
|
|
|
endfunc denver_enable_dco
|
|
|
|
|
|
|
|
/* ----------------------------------------------------
|
|
|
|
* Disable dynamic code optimizer (DCO)
|
|
|
|
* ----------------------------------------------------
|
|
|
|
*/
|
|
|
|
func denver_disable_dco
|
2020-08-06 07:10:40 +01:00
|
|
|
/* DCO is not supported on PN5 and later */
|
|
|
|
mrs x1, midr_el1
|
|
|
|
mov_imm x2, DENVER_MIDR_PN4
|
|
|
|
cmp x1, x2
|
|
|
|
b.hi 2f
|
2018-02-28 02:30:31 +00:00
|
|
|
|
Add "Project Denver" CPU support
Denver is NVIDIA's own custom-designed, 64-bit, dual-core CPU which is
fully ARMv8 architecture compatible. Each of the two Denver cores
implements a 7-way superscalar microarchitecture (up to 7 concurrent
micro-ops can be executed per clock), and includes a 128KB 4-way L1
instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2
cache, which services both cores.
Denver implements an innovative process called Dynamic Code Optimization,
which optimizes frequently used software routines at runtime into dense,
highly tuned microcode-equivalent routines. These are stored in a
dedicated, 128MB main-memory-based optimization cache. After being read
into the instruction cache, the optimized micro-ops are executed,
re-fetched and executed from the instruction cache as long as needed and
capacity allows.
Effectively, this reduces the need to re-optimize the software routines.
Instead of using hardware to extract the instruction-level parallelism
(ILP) inherent in the code, Denver extracts the ILP once via software
techniques, and then executes those routines repeatedly, thus amortizing
the cost of ILP extraction over the many execution instances.
Denver also features new low latency power-state transitions, in addition
to extensive power-gating and dynamic voltage and clock scaling based on
workloads.
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
2015-07-14 12:41:20 +01:00
|
|
|
/* turn off background work */
|
2020-08-06 07:10:40 +01:00
|
|
|
mov x18, x30
|
2018-02-28 02:30:31 +00:00
|
|
|
bl plat_my_core_pos
|
Add "Project Denver" CPU support
Denver is NVIDIA's own custom-designed, 64-bit, dual-core CPU which is
fully ARMv8 architecture compatible. Each of the two Denver cores
implements a 7-way superscalar microarchitecture (up to 7 concurrent
micro-ops can be executed per clock), and includes a 128KB 4-way L1
instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2
cache, which services both cores.
Denver implements an innovative process called Dynamic Code Optimization,
which optimizes frequently used software routines at runtime into dense,
highly tuned microcode-equivalent routines. These are stored in a
dedicated, 128MB main-memory-based optimization cache. After being read
into the instruction cache, the optimized micro-ops are executed,
re-fetched and executed from the instruction cache as long as needed and
capacity allows.
Effectively, this reduces the need to re-optimize the software routines.
Instead of using hardware to extract the instruction-level parallelism
(ILP) inherent in the code, Denver extracts the ILP once via software
techniques, and then executes those routines repeatedly, thus amortizing
the cost of ILP extraction over the many execution instances.
Denver also features new low latency power-state transitions, in addition
to extensive power-gating and dynamic voltage and clock scaling based on
workloads.
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
2015-07-14 12:41:20 +01:00
|
|
|
mov x1, #1
|
|
|
|
lsl x1, x1, x0
|
|
|
|
lsl x2, x1, #16
|
|
|
|
msr s3_0_c15_c0_2, x2
|
|
|
|
isb
|
|
|
|
|
|
|
|
/* wait till the background work turns off */
|
|
|
|
1: mrs x2, s3_0_c15_c0_2
|
|
|
|
lsr x2, x2, #32
|
|
|
|
and w2, w2, 0xFFFF
|
|
|
|
and x2, x2, x1
|
|
|
|
cbnz x2, 1b
|
|
|
|
|
2018-10-09 01:01:01 +01:00
|
|
|
mov x30, x18
|
2020-08-06 07:10:40 +01:00
|
|
|
2: ret
|
Add "Project Denver" CPU support
Denver is NVIDIA's own custom-designed, 64-bit, dual-core CPU which is
fully ARMv8 architecture compatible. Each of the two Denver cores
implements a 7-way superscalar microarchitecture (up to 7 concurrent
micro-ops can be executed per clock), and includes a 128KB 4-way L1
instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2
cache, which services both cores.
Denver implements an innovative process called Dynamic Code Optimization,
which optimizes frequently used software routines at runtime into dense,
highly tuned microcode-equivalent routines. These are stored in a
dedicated, 128MB main-memory-based optimization cache. After being read
into the instruction cache, the optimized micro-ops are executed,
re-fetched and executed from the instruction cache as long as needed and
capacity allows.
Effectively, this reduces the need to re-optimize the software routines.
Instead of using hardware to extract the instruction-level parallelism
(ILP) inherent in the code, Denver extracts the ILP once via software
techniques, and then executes those routines repeatedly, thus amortizing
the cost of ILP extraction over the many execution instances.
Denver also features new low latency power-state transitions, in addition
to extensive power-gating and dynamic voltage and clock scaling based on
workloads.
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
2015-07-14 12:41:20 +01:00
|
|
|
endfunc denver_disable_dco
|
|
|
|
|
2018-07-06 21:39:52 +01:00
|
|
|
func check_errata_cve_2017_5715
|
|
|
|
mov x0, #ERRATA_MISSING
|
|
|
|
#if WORKAROUND_CVE_2017_5715
|
|
|
|
/*
|
|
|
|
* Check if the CPU supports the special instruction
|
|
|
|
* required to flush the indirect branch predictor and
|
|
|
|
* RSB. Support for this operation can be determined by
|
|
|
|
* comparing bits 19:16 of ID_AFR0_EL1 with 0b0001.
|
|
|
|
*/
|
|
|
|
mrs x1, id_afr0_el1
|
|
|
|
mov x2, #0x10000
|
|
|
|
and x1, x1, x2
|
|
|
|
cbz x1, 1f
|
|
|
|
mov x0, #ERRATA_APPLIES
|
|
|
|
1:
|
|
|
|
#endif
|
|
|
|
ret
|
|
|
|
endfunc check_errata_cve_2017_5715
|
|
|
|
|
2018-08-28 17:11:30 +01:00
|
|
|
func check_errata_cve_2018_3639
|
|
|
|
#if WORKAROUND_CVE_2018_3639
|
|
|
|
mov x0, #ERRATA_APPLIES
|
|
|
|
#else
|
|
|
|
mov x0, #ERRATA_MISSING
|
|
|
|
#endif
|
|
|
|
ret
|
|
|
|
endfunc check_errata_cve_2018_3639
|
|
|
|
|
Add "Project Denver" CPU support
Denver is NVIDIA's own custom-designed, 64-bit, dual-core CPU which is
fully ARMv8 architecture compatible. Each of the two Denver cores
implements a 7-way superscalar microarchitecture (up to 7 concurrent
micro-ops can be executed per clock), and includes a 128KB 4-way L1
instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2
cache, which services both cores.
Denver implements an innovative process called Dynamic Code Optimization,
which optimizes frequently used software routines at runtime into dense,
highly tuned microcode-equivalent routines. These are stored in a
dedicated, 128MB main-memory-based optimization cache. After being read
into the instruction cache, the optimized micro-ops are executed,
re-fetched and executed from the instruction cache as long as needed and
capacity allows.
Effectively, this reduces the need to re-optimize the software routines.
Instead of using hardware to extract the instruction-level parallelism
(ILP) inherent in the code, Denver extracts the ILP once via software
techniques, and then executes those routines repeatedly, thus amortizing
the cost of ILP extraction over the many execution instances.
Denver also features new low latency power-state transitions, in addition
to extensive power-gating and dynamic voltage and clock scaling based on
workloads.
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
2015-07-14 12:41:20 +01:00
|
|
|
/* -------------------------------------------------
|
|
|
|
* The CPU Ops reset function for Denver.
|
|
|
|
* -------------------------------------------------
|
|
|
|
*/
|
|
|
|
func denver_reset_func
|
|
|
|
|
|
|
|
mov x19, x30
|
|
|
|
|
2018-01-11 01:03:22 +00:00
|
|
|
#if IMAGE_BL31 && WORKAROUND_CVE_2017_5715
|
|
|
|
/*
|
|
|
|
* Check if the CPU supports the special instruction
|
|
|
|
* required to flush the indirect branch predictor and
|
|
|
|
* RSB. Support for this operation can be determined by
|
|
|
|
* comparing bits 19:16 of ID_AFR0_EL1 with 0b0001.
|
|
|
|
*/
|
|
|
|
mrs x0, id_afr0_el1
|
|
|
|
mov x1, #0x10000
|
|
|
|
and x0, x0, x1
|
|
|
|
cmp x0, #0
|
|
|
|
adr x1, workaround_bpflush_runtime_exceptions
|
|
|
|
mrs x2, vbar_el3
|
|
|
|
csel x0, x1, x2, ne
|
|
|
|
msr vbar_el3, x0
|
|
|
|
#endif
|
|
|
|
|
2018-08-28 17:11:30 +01:00
|
|
|
#if WORKAROUND_CVE_2018_3639
|
|
|
|
/*
|
|
|
|
* Denver CPUs with DENVER_MIDR_PN3 or earlier, use different
|
|
|
|
* bits in the ACTLR_EL3 register to disable speculative
|
|
|
|
* store buffer and memory disambiguation.
|
|
|
|
*/
|
|
|
|
mrs x0, midr_el1
|
|
|
|
mov_imm x1, DENVER_MIDR_PN4
|
|
|
|
cmp x0, x1
|
|
|
|
mrs x0, actlr_el3
|
|
|
|
mov x1, #(DENVER_CPU_DIS_MD_EL3 | DENVER_CPU_DIS_SSB_EL3)
|
|
|
|
mov x2, #(DENVER_PN4_CPU_DIS_MD_EL3 | DENVER_PN4_CPU_DIS_SSB_EL3)
|
|
|
|
csel x3, x1, x2, ne
|
|
|
|
orr x0, x0, x3
|
|
|
|
msr actlr_el3, x0
|
|
|
|
isb
|
|
|
|
dsb sy
|
|
|
|
#endif
|
|
|
|
|
2018-06-25 19:36:47 +01:00
|
|
|
/* ----------------------------------------------------
|
|
|
|
* Reset ACTLR.PMSTATE to C1 state
|
|
|
|
* ----------------------------------------------------
|
|
|
|
*/
|
|
|
|
mrs x0, actlr_el1
|
|
|
|
bic x0, x0, #DENVER_CPU_PMSTATE_MASK
|
|
|
|
orr x0, x0, #DENVER_CPU_PMSTATE_C1
|
|
|
|
msr actlr_el1, x0
|
|
|
|
|
Add "Project Denver" CPU support
Denver is NVIDIA's own custom-designed, 64-bit, dual-core CPU which is
fully ARMv8 architecture compatible. Each of the two Denver cores
implements a 7-way superscalar microarchitecture (up to 7 concurrent
micro-ops can be executed per clock), and includes a 128KB 4-way L1
instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2
cache, which services both cores.
Denver implements an innovative process called Dynamic Code Optimization,
which optimizes frequently used software routines at runtime into dense,
highly tuned microcode-equivalent routines. These are stored in a
dedicated, 128MB main-memory-based optimization cache. After being read
into the instruction cache, the optimized micro-ops are executed,
re-fetched and executed from the instruction cache as long as needed and
capacity allows.
Effectively, this reduces the need to re-optimize the software routines.
Instead of using hardware to extract the instruction-level parallelism
(ILP) inherent in the code, Denver extracts the ILP once via software
techniques, and then executes those routines repeatedly, thus amortizing
the cost of ILP extraction over the many execution instances.
Denver also features new low latency power-state transitions, in addition
to extensive power-gating and dynamic voltage and clock scaling based on
workloads.
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
2015-07-14 12:41:20 +01:00
|
|
|
/* ----------------------------------------------------
|
|
|
|
* Enable dynamic code optimizer (DCO)
|
|
|
|
* ----------------------------------------------------
|
|
|
|
*/
|
|
|
|
bl denver_enable_dco
|
|
|
|
|
|
|
|
ret x19
|
|
|
|
endfunc denver_reset_func
|
|
|
|
|
|
|
|
/* ----------------------------------------------------
|
|
|
|
* The CPU Ops core power down function for Denver.
|
|
|
|
* ----------------------------------------------------
|
|
|
|
*/
|
|
|
|
func denver_core_pwr_dwn
|
|
|
|
|
|
|
|
mov x19, x30
|
|
|
|
|
|
|
|
/* ---------------------------------------------
|
|
|
|
* Force the debug interfaces to be quiescent
|
|
|
|
* ---------------------------------------------
|
|
|
|
*/
|
|
|
|
bl denver_disable_ext_debug
|
|
|
|
|
|
|
|
ret x19
|
|
|
|
endfunc denver_core_pwr_dwn
|
|
|
|
|
|
|
|
/* -------------------------------------------------------
|
|
|
|
* The CPU Ops cluster power down function for Denver.
|
|
|
|
* -------------------------------------------------------
|
|
|
|
*/
|
|
|
|
func denver_cluster_pwr_dwn
|
|
|
|
ret
|
|
|
|
endfunc denver_cluster_pwr_dwn
|
|
|
|
|
2018-07-06 21:39:52 +01:00
|
|
|
#if REPORT_ERRATA
|
|
|
|
/*
|
|
|
|
* Errata printing function for Denver. Must follow AAPCS.
|
|
|
|
*/
|
|
|
|
func denver_errata_report
|
|
|
|
stp x8, x30, [sp, #-16]!
|
|
|
|
|
|
|
|
bl cpu_get_rev_var
|
|
|
|
mov x8, x0
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Report all errata. The revision-variant information is passed to
|
|
|
|
* checking functions of each errata.
|
|
|
|
*/
|
|
|
|
report_errata WORKAROUND_CVE_2017_5715, denver, cve_2017_5715
|
2018-08-28 17:11:30 +01:00
|
|
|
report_errata WORKAROUND_CVE_2018_3639, denver, cve_2018_3639
|
2018-07-06 21:39:52 +01:00
|
|
|
|
|
|
|
ldp x8, x30, [sp], #16
|
|
|
|
ret
|
|
|
|
endfunc denver_errata_report
|
|
|
|
#endif
|
|
|
|
|
Add "Project Denver" CPU support
Denver is NVIDIA's own custom-designed, 64-bit, dual-core CPU which is
fully ARMv8 architecture compatible. Each of the two Denver cores
implements a 7-way superscalar microarchitecture (up to 7 concurrent
micro-ops can be executed per clock), and includes a 128KB 4-way L1
instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2
cache, which services both cores.
Denver implements an innovative process called Dynamic Code Optimization,
which optimizes frequently used software routines at runtime into dense,
highly tuned microcode-equivalent routines. These are stored in a
dedicated, 128MB main-memory-based optimization cache. After being read
into the instruction cache, the optimized micro-ops are executed,
re-fetched and executed from the instruction cache as long as needed and
capacity allows.
Effectively, this reduces the need to re-optimize the software routines.
Instead of using hardware to extract the instruction-level parallelism
(ILP) inherent in the code, Denver extracts the ILP once via software
techniques, and then executes those routines repeatedly, thus amortizing
the cost of ILP extraction over the many execution instances.
Denver also features new low latency power-state transitions, in addition
to extensive power-gating and dynamic voltage and clock scaling based on
workloads.
Signed-off-by: Varun Wadekar <vwadekar@nvidia.com>
2015-07-14 12:41:20 +01:00
|
|
|
/* ---------------------------------------------
|
|
|
|
* This function provides Denver specific
|
|
|
|
* register information for crash reporting.
|
|
|
|
* It needs to return with x6 pointing to
|
|
|
|
* a list of register names in ascii and
|
|
|
|
* x8 - x15 having values of registers to be
|
|
|
|
* reported.
|
|
|
|
* ---------------------------------------------
|
|
|
|
*/
|
|
|
|
.section .rodata.denver_regs, "aS"
|
|
|
|
denver_regs: /* The ascii list of register names to be reported */
|
|
|
|
.asciz "actlr_el1", ""
|
|
|
|
|
|
|
|
func denver_cpu_reg_dump
|
|
|
|
adr x6, denver_regs
|
|
|
|
mrs x8, ACTLR_EL1
|
|
|
|
ret
|
|
|
|
endfunc denver_cpu_reg_dump
|
|
|
|
|
2020-08-28 22:00:15 +01:00
|
|
|
/* macro to declare cpu_ops for Denver SKUs */
|
|
|
|
.macro denver_cpu_ops_wa midr
|
|
|
|
declare_cpu_ops_wa denver, \midr, \
|
|
|
|
denver_reset_func, \
|
|
|
|
check_errata_cve_2017_5715, \
|
|
|
|
CPU_NO_EXTRA2_FUNC, \
|
|
|
|
denver_core_pwr_dwn, \
|
|
|
|
denver_cluster_pwr_dwn
|
|
|
|
.endm
|
|
|
|
|
|
|
|
denver_cpu_ops_wa DENVER_MIDR_PN0
|
|
|
|
denver_cpu_ops_wa DENVER_MIDR_PN1
|
|
|
|
denver_cpu_ops_wa DENVER_MIDR_PN2
|
|
|
|
denver_cpu_ops_wa DENVER_MIDR_PN3
|
|
|
|
denver_cpu_ops_wa DENVER_MIDR_PN4
|
|
|
|
denver_cpu_ops_wa DENVER_MIDR_PN5
|
|
|
|
denver_cpu_ops_wa DENVER_MIDR_PN6
|
|
|
|
denver_cpu_ops_wa DENVER_MIDR_PN7
|
|
|
|
denver_cpu_ops_wa DENVER_MIDR_PN8
|
2019-12-17 22:21:38 +00:00
|
|
|
denver_cpu_ops_wa DENVER_MIDR_PN9
|