The power of vector processors comes from their ability to process several elements at once (applying the same operations to all elements in a short vector). This, combined with the standard processor techniques of independent processing units (load/store, integer, float, vector) running in parallel, and pipelined units processing several instructions simultaneously (sequentially up to the instruction latency) make it possible to process multiple data elements per clock cycle. Achieving this in practice is far from easy and requires an in-depth knowledge of the processor and a detailed analysis and understanding of the algorithm. For large datasets memory management is also an issue: a processor waiting for data is not producing results.
Multicore Processors
Today's processors increasingly provide multiple cores onchip; but most DSP and scientific/engineering codes cannot make effective use of the increased power provided by multicore processors. We have many years of experience with multicore and multiprocessor systems.
We can
- provide multithreaded libraries which yield transparent access to multicore facilities with minimal code restructuring
- multithread your existing applications for increased performance
- where speed is crucial: provide hand-crafted assembler routines for optimal performance
Contact us to discuss your requirements.





