[Buildroot] [PATCH v2] configs/kontron_bl_imx8mm_defconfig: new defconfig

Tue Jan 18 20:40:09 UTC 2022

Hi Heiko, Thomas,

> Il giorno 18 gen 2022, alle ore 20:58, Heiko Thiery <heiko.thiery at gmail.com> ha scritto:
> 
> Hi Giulio, Hi Thomas,
> 
> 
>> Am Di., 18. Jan. 2022 um 07:58 Uhr schrieb Giulio Benetti
>> <giulio.benetti at benettiengineering.com>:
>> 
>> Hi Thomas,
>> 
>>> On 18/01/22 00:04, Thomas Petazzoni wrote:
>>> On Mon, 17 Jan 2022 20:58:52 +0100
>>> Giulio Benetti <giulio.benetti at benettiengineering.com> wrote:
>>> 
>>>>> diff --git a/configs/kontron_bl_imx8mm_defconfig b/configs/kontron_bl_imx8mm_defconfig
>>>>> new file mode 100644
>>>>> index 0000000000..5b5648cc14
>>>>> --- /dev/null
>>>>> +++ b/configs/kontron_bl_imx8mm_defconfig
>>>>> @@ -0,0 +1,59 @@
>>>>> +# Architecture
>>>>> +BR2_aarch64=y
>>>>> +BR2_ARM_FPU_VFPV3=y
>>>> 
>>>> i.MX8MM supports VFPv4-D16, so I would substitute this ^^^ with:
>>>> BR2_ARM_FPU_VFPV4D16
>>>> 
>>>> This is to achieve the maximum performance.
>>> 
>>> Nope, that's not really how it works. VFPv3 is better than VFPv3-D16.
>>> Indeed VFPv3 means that the the FPU has 32 double precision registers,
>>> while VFPv3-D16 means that it has "only" 16 double precision registers.
>> 
>> I'm a bit confused. The datasheet[1] states at 1.4.1:
>> ```
>> • Media Processing Engine (MPE) with NEON technology supporting the
>> Advanced Single Instruction Multiple Data architecture
>> • Floating Point Unit (FPU) with support of the VFPv4-D16 architecture
>> ```
>> [1]: https://www.nxp.com/docs/en/data-sheet/IMX8MMCEC.pdf
>> 
>> So I expect it to only have VFPv4-D16. And also NEON, but we can't mix
>> them up in Aarch64 as I know.
>> 
>>> So, if the i.MX8MM has only the VFPv3-D16, then indeed it should be
>>> chosen, because code compiled with VFPv3 may not work, as it might use
>>> too many double precision registers.
>>> 
>>> On the other hand, if the i.MX8MM has the full VFPv3, then
>>> BR2_ARM_FPU_VFPV3=y should be used.
>> 
>> It only has VFPv4-D16. I think I've explained myself bad. This is not to
>> achieve maximum performance, simply datasheet states it only supports
>> VFPv4-D16[1] and judging from this:
>> https://developer.arm.com/documentation/dui0472/h/CJADDCIF#:~:text=VFPv3%20has%2032%20double%2Dprecision,VFPv3%20with%20half%2Dprecision%20extensions.&text=VFPv4%20has%2032%20double%2Dprecision,to%20the%20features%20of%20VFPv3.
>> 
>> VFPv4 is retro-compatible with VFPv3, and I'd expect VFP4v4-D16 to be
>> like that too. Is it possible that at the moment, by mistake, it worked
>> without using more than 16 registers as VFPv3 that is retro-compatible
>> with VFPv4-D16(maybe?).
>> 
>>> That being said, the gcc man page only documents vfpv3, vfpv3-d16-fp16,
>>> vfpv3-fp16 as extension for armv7-a. Interesting, would need to look
>>> into this.
>> 
>> Yes, but i.MX8MM is a cortex-A53, so armv8-a:
>> https://developer.arm.com/ip-products/processors/cortex-a/cortex-a53
>> 
>> that then is retro-compatible to armv7-a if in Aarch32.
>> But here we use it as BR2_aarch64, so armv8-a:
>> https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html
>> 
>> and it lists:
>> ```
>> -mfpu=name
>> This specifies what floating-point hardware (or hardware emulation) is
>> available on the target. Permissible names are: ‘auto’, ‘vfpv2’,
>> ‘vfpv3’, ‘vfpv3-fp16’, ‘vfpv3-d16’, ‘vfpv3-d16-fp16’, ‘vfpv3xd’,
>> ‘vfpv3xd-fp16’, ‘neon-vfpv3’, ‘neon-fp16’, ‘vfpv4’, ‘vfpv4-d16’,
>> ‘fpv4-sp-d16’, ‘neon-vfpv4’, ‘fpv5-d16’, ‘fpv5-sp-d16’, ‘fp-armv8’,
>> ‘neon-fp-armv8’ and ‘crypto-neon-fp-armv8’. Note that ‘neon’ is an alias
>> for ‘neon-vfpv3’ and ‘vfp’ is an alias for ‘vfpv2’.
>> ```
>> 
>> So "vfpv4-d16" is possible.
>> 
>> Here I think we need vfpv4-d16 and not vfpv3 because of both DS and RM
>> of i.MX8MM.
>> 
>> Does it sound good to you?
> 
> After following the discussion in IRC between Yann and Michael I am
> sure that this setting cannot be used for the aarch64/cortex-A53 CPU.
> The settings BR2_ARM_CPU_HAS_FPU, BR2_ARM_CPU_HAS_VFPV2,
> BR2_ARM_CPU_HAS_VFPV3, BR2_ARM_CPU_HAS_VFPV4 and
> BR2_ARM_CPU_HAS_FP_ARMV8 are set implicitly.

Yes, I was wrong and you point that fpu strategy falls back to BR2_ARM_FPU_FP_ARMV8 and it’s ok. 
But i.MX8MM supports NEON too, so I would go for setting BR2_ARM_FPU_NEON_FP_ARMV8,
because as pointed here[1] the armv8 neon now supports IEEE 754 floating point standard.

BUT I’ve found that gcc[2] points that:
‘’’

If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=neon), note that floating-point operations are not generated by GCC’s auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision.
‘’’

So I still can’t understand if it’s something core related or something gcc related. Does anyone have any more in-depth explanation?

Anyway I agree with Heiko to remove the fpu strategy in this defconfig.

Best regards
—-
Giulio Benetti
Benetti Engineering sas

[1]: https://developer.arm.com/architectures/instruction-sets/floating-point
[2]: https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html

> 
> I probably took the setting from the freescale_imx8mmevk_defconfig,
> which is then also not correct.
> 
> I then looked further at the other aarch64 defconfigs and saw that
> there are more that set the FPU settings.
> 
>  #  grep aarch64 -A50 configs/* | grep FPU | wc -l
>  28
> 
> I will remove this setting from the defconfig in this patch and also
> for the other kontron defconfig.
> 
> -- 
> Heiko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.buildroot.org/pipermail/buildroot/attachments/20220118/310ba3ae/attachment-0001.html>