US-8144156

Sequencer with async SIMD array

PublishedMarch 27, 2012

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A 3D graphics architecture in which a buffer is placed between the sequencer and the processing element (PE) array. The sequencer and PE array are not designed to run in lock step: instead the sequencer and PE array are decoupled to allow the PEs to run at 100% efficiency even when the sequencer is switching between threads and performing other flow control operations. Thus, the rate of instruction processing in the PE array is not coupled to the rate of instruction processing in the sequencer.

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A parallel rendering architecture in computer graphics, comprising: a sequencer which decodes rendering commands for a plurality of pixels and generates fragment-processing commands therefrom; a SIMD processor operatively connected to said sequencer, the processor comprising a plurality of processing elements that are configured to receive and operate on said fragment-processing commands and also separately connected to send and receive pixel data, and automatically jump to rendering another pixel when rendering on any one pixel stalls; and a buffer operatively interposed between said sequencer and said processor, into which the sequencer deposits said fragment-processing commands and from which said processor receives said fragment-processing commands, that decouples operations between the sequencer and the processor; wherein said sequencer is logically asynchronous to said processing elements and does not have to work on the same instructions at the same time as said processing elements, and wherein said sequencer and said processing elements are physically synchronous, and said sequencer and said processing elements operate at different rates, whereby the likelihood of stalling in said processor is reduced.

2. The architecture of claim 1 , wherein said fragment-processing commands include vector commands.

3. The architecture of claim 1 , wherein said processing elements expand vector commands to sequences of scalar commands.

4. The architecture of claim 1 , wherein said processing elements run in parallel.

5. The architecture of claim 1 , wherein said processing elements are one of a group consisting of pixel processors, fragment processors, and vertex processors.

6. A parallel rendering architecture in 3D graphics, comprising: at least one sequencer which decodes 3D rendering commands for a first plurality of data items and generates fragment-processing commands therefrom; a plurality of SIMD processing elements, each operating asynchronously to and configured to receive processing instructions from said sequencer and also separately connected to send and receive pixel data, the processing elements automatically transferring processing to another plurality of data items when processing on said first plurality of data items stalls; and at least one buffer operatively interposed between said sequencer and said plurality of processing elements, into which the sequencer deposits said fragment-processing commands and from which said processing elements receive said fragment-processing commands, that decouples operations between the sequencer and the processing elements, wherein said sequencer is logically asynchronous to said processing elements and does not have to work on the same instructions at the same time, and wherein said sequencer and said processing elements are physically synchronous, and said sequencer and said processing elements operate at different rates, whereby the likelihood of stalling in said sequencer is reduced.

7. The architecture of claim 6 , wherein said processing commands include vector commands.

8. The architecture of claim 6 , wherein said processing elements expand vector commands to sequences of scalar commands.

9. The architecture of claim 6 , wherein said processing elements run in parallel.

10. The architecture of claim 6 , wherein said processing elements are one of a group consisting of pixel processors, fragment processors, and vertex processors.

11. A parallel architecture for 3D graphics processing, comprising: at least one sequencer which decodes 3D rendering commands for a plurality of 3D graphics data items and generates fragment-processing commands therefrom, and automatically transfers processing to another plurality of 3D graphics data items when processing on said first plurality of 3d graphics data items stalls or is predicted to stall; a plurality of SIMD processing elements, each configured to receive processing commands from said sequencer and also separately connected to send and receive pixel data; and a buffer operatively interposed between said sequencer and said plurality of processing elements, into which the sequencer deposits said fragment-processing commands and from which said processing elements receive said fragment-processing commands, that enables asynchronous operations thereof; wherein said sequencer is logically asynchronous to said processing elements and does not have to work on the same instructions at the same time, and wherein said sequencer and said processing elements are physically synchronous, and said sequencer and said processing elements operate at different rates, whereby the likelihood of stalling in said sequencer is reduced.

12. The architecture of claim 11 , wherein said processing commands include vector commands.

13. The architecture of claim 11 , wherein said processing elements expand vector commands to sequences of scalar commands.

14. The architecture of claim 11 , wherein said processing elements run in parallel.

15. The architecture of claim 11 , wherein said processing elements are one of a group consisting of pixel processors, fragment processors, and vertex processors.

16. A method for rendering 3D graphics data, comprising the steps of: running a rendering program on graphics data multi-threaded, such that if a thread stalls because of a memory read for one set of data, then said respective thread is automatically switched to another thread for another set of data; queuing 3D instructions, including vector instructions from a sequencer into an FIFO buffer; receiving and executing said instructions locally by processing elements; and decoupling said processing elements from said sequencer by placing said buffer between said processing elements and said sequencer to thereby allow asynchronous operation, whereby the operation of said sequencer is not stalled as said processing elements are executing vector operations; wherein said sequencer is logically asynchronous to said processing elements and said sequencer and said processing elements do not have to work on the same instructions at the same time, wherein said sequencer and said processing elements are physically synchronous, and said sequencer and said processing elements operate at different rates.

17. The method of claim 16 , wherein said instructions include vector commands.

18. The method of claim 16 , wherein said processing elements expand vector commands to sequences of scalar commands.

19. The method of claim 16 , wherein said processing elements run in parallel.

20. The method of claim 16 , wherein said processing elements are one of a group consisting of pixel processors, fragment processors, and vertex processors.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06T G06F

Patent Metadata

Filing Date

September 28, 2004

Publication Date

March 27, 2012

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search