Precision might be one reason. This Intel article indicates that there can be a loss in precision in some situations: [software.intel.com] (see the “Precision Considerations” section)
Of course, that precision concern only seems to matter in situations where you already have the value in the x87 FPU register stack, which is not the case in your straw man function above. Besides, it’s probably pretty common that if you have a floating point value that you need to convert to an integer value then you probably were just doing at least one floating point calculation, which would eliminate the “flds” instruction in your last code example, since “fisttpl” would already have its operand on the FPU stack. Also, one pollutes/consumes XMM registers while the other pollutes/consumes x87/MMX registers.
Benchmarking is probably the only real way to know how one performs compared to the other.
-Dave