How do you load 3 floats using neon intrinsics

arm neon intrinsics

486 просмотра

2 ответа

I'm trying to convert this neon code to intrinsics:

vld1.32                {d0}, [%[pInVertex1]]
flds                   s2, [%[pInVertex1], #8]

This loads 3 32-bit floats from the variable pInVertex1 into the d0 and d1 registers. I can't find any equivalent version for instrinsics. There is vld1q_f32, but that only works for 4 floats. Anyone know of a efficient way of doing this (I mean without extra copying)?

Автор: user3259383 Источник Размещён: 29.10.2019 06:32

Ответы (2)

0 плюса

The only instruction that writes only 3 32-bit floats in Aarch32 is a multiple-load instruction:

r0 holds the address of the structure
FLDMIAS r0, {s0-s2}

This can be used either in VFP or Neon code.

I do not know about the corresponding intrinsic.

Автор: Dric512 Размещён: 10.04.2016 03:02

0 плюса

In DirectXMath I implemented the ARM-NEON version of XMLoadFloat3 as:

float32x2_t x = vld1_f32( reinterpret_cast<const float*>(pSource) );
float32x2_t zero = vdup_n_f32(0);
float32x2_t y = vld1_lane_f32( reinterpret_cast<const float*>(pSource)+2, zero, 0 );
return vcombine_f32( x, y );
Автор: Chuck Walbourn Размещён: 11.04.2016 04:17
Вопросы из категории :