Legal claims defining the scope of protection, as filed with the USPTO.
1. A stream data processing method for causing a computer to execute a processing of receiving stream data input to the computer as primary information and of generating secondary information by acquiring primary information within a predetermined period from among the received primary information, comprising the steps of: receiving the stream data input to the computer as the primary information; determining whether or not the received primary information includes delay information indicating that there is primary information to arrive with a delay; and performing a first real-time processing when a result of the determination includes delay information, wherein: the step of performing the first real-time processing includes the steps of: extracting primary information corresponding to a predetermined lifetime from among the received primary information as secondary information, and generating a real-time processing result from the extracted secondary information; receiving primary information corresponding to the delay information; and recalculating the real-time processing result after receiving the primary information that arrives with a delay; the step of generating the real-time processing result from the extracted secondary information includes the steps of: obtaining secondary information based on the lifetime from primary information excluding primary information that is to arrive with a delay when the delay information is received, and outputting the secondary information as a real-time output result that needs to be recalculated; and retaining result restore information for obtaining the real-time output result as information that needs to be recalculated after receiving the primary information corresponding to the delay information, along with the lifetime; and the step of recalculating the real-time processing result after receiving the primary information that arrives with a delay includes the step of recalculating the secondary information from the primary information that arrives with a delay and the result restore information, and outputting a result of the recalculation as a delay output result.
2. The stream data processing method according to claim 1 , wherein: the step of outputting the secondary information as the real-time output result that needs to be recalculated when the delay information is received includes the step of adding an unconfirmed flag indicating that the recalculation is necessary to the secondary information before the outputting; and the step of recalculating the secondary information from the primary information that arrives with a delay and the result restore information, and outputting the result of the recalculation as the delay output result includes the step of adding a confirmed flag indicating that the delay output result represents the result of the recalculation to the delay output result before the outputting.
3. The stream data processing method according to claim 1 , further comprising the steps of: determining whether or not the received primary information includes delay dissolution information indicating that all of primary information corresponding to the delay information have arrived; and performing a second real-time processing when a result of the determination includes the delay dissolution information, wherein: the step of performing the second real-time processing includes the step of extracting primary information corresponding to a predetermined lifetime from among the received primary information as secondary information, and generating a real-time processing result from the extracted secondary information; and the step of performing the first real-time processing when the result of the determination includes delay information is executed after the delay information is received until the delay dissolution information is received.
4. The stream data processing method according to claim 1 , wherein the step of extracting the primary information corresponding to the predetermined lifetime from among the received primary information as the secondary information, and generating the real-time processing result from the extracted secondary information includes the steps of: receiving a processing request for acquiring the primary information; determining whether or not the processing request includes a request for a processing of executing a window operation in which a data row to be cut out from the primary information is specified and the primary information is converted into secondary information; determining whether or not the number of data rows is specified by the processing request if it is determined that the processing request includes the request for the processing of executing the window operation; adding the unconfirmed flag to the secondary information if it is determined that the number of data rows is specified, and storing the cut-out data rows along with the lifetime in a result restore information area for storing result restore information serving as a midway processing result necessary for the recalculation; determining, if it is determined that the number of data rows is not specified, whether or not the processing request is made for a partitioned window in which the number of data rows is specified on a group basis and a partition key for classifying the secondary information on a group basis is the same as a key included in the delay information; and adding the unconfirmed flag to the secondary information if it is determined that the processing request is made for the partitioned window and the partition key is the same as the key included in the delay information.
5. The stream data processing method according to claim 2 , wherein the step of extracting the primary information corresponding to the predetermined lifetime from among the received primary information as the secondary information, and generating the real-time processing result from the extracted secondary information includes the steps of: receiving a processing request for acquiring the primary information; determining whether or not the processing request includes a request for a join operation for generating secondary information by joining a plurality of pieces of the primary information under a predetermined condition; referencing a temporary storage area for temporarily storing data if it is determined that the processing request includes the request for the join operation, and acquiring join-purpose temporary storage information that has been temporarily stored and is to be joined; determining whether or not the primary information is to be joined with the join-purpose temporary storage information to which the unconfirmed flag is added; determining whether or not the primary information is information to which the unconfirmed flag is added; adding the unconfirmed flag to a join result if it is determined one of that the primary information is to be joined with the join-purpose temporary storage information to which the unconfirmed flag is added and that the primary information is the information to which the unconfirmed flag is added; and storing the join-purpose temporary storage information into a result restore information area as join-purpose result restore information along with the lifetime.
6. The stream data processing method according to claim 5 , wherein the step of recalculating the real-time processing result after receiving the primary information that arrives with a delay includes the steps of: determining whether or not the processing request includes the request for the join operation; acquiring, if it is determined that the processing request includes the request for the join operation, the join-purpose result restore information having the lifetime corresponding to a timestamp of the primary information that arrives with a delay from the result restore information area; joining the primary information that arrives with a delay with the join-purpose result restore information to be joined; adding the confirmed flag indicating that the processing result is confirmed by performing a recalculation on the join result based on the primary information that arrives with a delay before the outputting; and deleting from the result restore information area the result restore information whose lifetime expires earlier than the timestamp of the primary information that arrives with a delay.
7. The stream data processing method according to claim 2 , wherein the step of extracting the primary information corresponding to the predetermined lifetime from among the received primary information as the secondary information, and generating the real-time processing result from the extracted secondary information includes the steps of: receiving a processing request for acquiring the primary information; determining whether or not the processing request includes a request for executing aggregation processing on the primary information; referencing a temporary storage area for temporarily storing data if it is determined that the processing request includes the request for the aggregation processing, and acquiring aggregation-purpose temporary storage information to be aggregated; determining whether or not the aggregation-purpose temporary storage information is information to which the unconfirmed flag is added; determining whether or not the primary information is information to which the unconfirmed flag is added; adding the unconfirmed flag to a aggregation result if it is determined one of that the aggregation-purpose temporary storage information is the information to which the unconfirmed flag is added and that the primary information is information to which the unconfirmed flag is added; and storing the aggregation-purpose temporary storage information into a result restore information area for storing result restore information serving as a midway processing result necessary for the recalculation as aggregation-purpose result restore information along with the lifetime.
8. The stream data processing method according to claim 7 , wherein the step of recalculating the real-time processing result after receiving the primary information that arrives with a delay includes the steps of: determining whether or not the processing request includes an aggregation operator; referencing the result restore information area if it is determined that the processing request includes the aggregation operator, and acquiring the aggregation-purpose result restore information whose lifetime corresponds to the timestamp of the primary information that arrives with a delay; recalculating the aggregation result from the primary information that arrives with a delay and the aggregation-purpose result restore information to be aggregated; adding the confirmed flag indicating that the processing result is confirmed by performing the recalculation on the aggregation result based on the primary information that arrives with a delay before the outputting; and deleting from the result restore information area the result restore information whose lifetime expires earlier than the timestamp of the primary information that arrives with a delay.
9. The stream data processing method according to claim 1 , further comprising the steps of: acquiring a system time of the computer; acquiring a timestamp of stream data that is transmitted last by a node that has transmitted the input stream data, as application time information; determining, in comparison between the application time information and the system time, whether or not a difference therebetween is within a delay information generation threshold value that is time information preset for detecting a timeout of the stream data; and generating delay information if it is determined that the difference between the application time information and the system time exceeds the delay information generation threshold value, wherein the step of determining whether or not the received primary information includes the delay information indicating that there is the primary information to arrive with a delay includes setting the generated delay information as an input.
10. The stream data processing method according to claim 9 , wherein the step of generating the delay information includes the steps of: determining, in information stored in a node stream management table for managing a relationship between each node that transmits the primary information and the primary information, whether or not information on the stream data transmitted from the node is included in a preset schema restriction condition; and adding the schema restriction condition to the delay information if it is determined that the information on the stream data transmitted from the node is included in the schema restriction condition.
11. The stream data processing method according to claim 1 , further comprising deciding by a preset selection one of to output both of the real-time processing result and the delay output result from the computer to another computer operated by a user and to output only the real-time processing result without outputting the delay output result.
12. The stream data processing method according to claim 1 , further comprising the step of deciding by a preset selection whether or not a correct processing result including the delay output result is to be stored into an external storage medium, wherein if the correct processing result including the delay output result is to be stored into the external storage medium, without storing a real-time output result that needs to be recalculated into the external storage medium, a real-time output result that has been generated from primary information that does not include the delay information and a real-time output result that has been recalculated based on the primary information that arrives with a delay is stored into the external storage medium.
13. The stream data processing method according to claim 1 , wherein if a data size for retaining result restore information necessary for the recalculation along with the lifetime exceeds a preset memory size upper limit value, a procedure for one of deleting data having an old timestamp from among the result restore information, temporarily suspending inputting of the stream data, and processing the stream data by deleting a portion thereof is executed.
14. A computer system, which is provided with a processor, a storage system, and an interface and set in the storage system, and in which stream data input through the interface is acquired as primary information, and secondary information is generated for the acquired primary information based on a window for defining a lifetime during which the primary information is to be processed, the computer system comprising: a first processing module for outputting, as a real-time processing result, a processing result excluding primary information that arrives with a delay based on delay information indicating that a portion of the primary information arrives with a delay; a result restore information retention module for retaining, along with the lifetime, result restore information necessary for a recalculation performed when primary information corresponding to the delay information arrives; and a delay tuple recalculation module for recalculating, when the primary information corresponding to the delay information arrives, the secondary information from the result restore information and the primary information corresponding to the delay information, and outputting a result of the recalculation as a delay output result.
15. The computer system according to claim 14 , further comprising: a timeout detection module for acquiring a timestamp of stream data that is transmitted last by a node that has transmitted the input stream data, as application time information, and determining, in comparison between the application time information and a system time of the computer, whether or not a difference therebetween is within a delay information generation threshold value that is time information preset for detecting a timeout of the stream data; and a delay processing heartbeat tuple generation module for generating the delay information if the timeout detection module determines that the difference exceeds the delay information generation threshold value.
16. The computer system according to claim 14 , wherein: the first processing module adds an unconfirmed flag indicating that the real-time processing result includes secondary information that needs to be recalculated to the real-time processing result; and the delay tuple recalculation module adds a confirmed flag indicating that the delay output result represents the result of the recalculation to the delay output result.
17. The computer system according to claim 16 , further comprising: an output method setting module for setting, with respect to the real-time output result and the delay output result, whether or not a correct processing result including the delay output result is stored into an external storage medium without outputting the delay output result as a result to be output; an archive execution module for storing, if the correct processing result including the delay output result is stored into the external storage medium without outputting the delay output result as the result to be output, a processing result to which the confirmed flag is added into the external storage medium without storing a processing result to which the unconfirmed flag is added into the external storage medium, the confirmed flag indicating that the processing result is confirmed by recalculating a processing based on primary information to which the unconfirmed flag is not added and which corresponds to the delay information; and an output result control module for outputting only the real-time output result.
18. A machine-readable medium for storing a program for causing a computer to execute a stream data processing of acquiring stream data input to the computer as primary information, and of generating secondary information for the acquired primary information based on a window for defining a lifetime during which the primary information is to be processed, the program executing the computer to execute the procedures of: extracting, based on delay information indicating that a portion of the primary information arrives with a delay, a processing result excluding primary information that arrives with a delay as the secondary information, and outputting the secondary information as a real-time processing result; retaining, along with the lifetime, a midway processing result necessary for a recalculation performed when primary information corresponding to the delay information arrives; and recalculating, when the primary information corresponding to the delay information arrives, the secondary information from the midway processing result and the primary information corresponding to the delay information, and outputting a recalculation result as the delay output result.
19. The machine-readable medium for storing the program according to claim 18 , wherein: the procedure of outputting the secondary information as the real-time processing result includes adding an unconfirmed flag indicating that the real-time processing result includes secondary information that needs to be recalculated to the real-time processing result; and the procedure of outputting the recalculation result as the delay output result includes adding a confirmed flag indicating that the delay output result represents the recalculation result to the delay output result.
Unknown
December 7, 2010
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.