Legal claims defining the scope of protection, as filed with the USPTO.
1. A method for data deduplication in a travel and transportation data processing system, the method comprising: loading into memory for comparison from a database of a multiplicity of multi-field records, a pair of two different multi-field records; submitting the pair to a similarity model, the similarity model correlating values for different fields of the multiplicity of multi-field records with a single person and producing a probability of duplication responsive to the submission by correlating a companion passenger with a specified individual so as to indicate a probability of duplication whenever the companion passenger appears in a pair of records submitted to the model for comparison; on condition that the similarity model produces a high probability, automatically merging the pair into a single record without manual intervention, but otherwise on condition that the similarity model produces a medium probability, placing the pair in a queue pending manual intervention and manual merging, but otherwise on condition that the similarity model produces a low probability, omitting the pair from consideration of merging; and, repeating the submitting, and one of the automatic merging, placing and omitting for each other pair of different multi-field records in the database.
2. The method of claim 1 , wherein the model is a machine learning model trained on different correlated pairs of records.
3. The method of claim 1 , further comprising training the model by feeding back into the model, each pair of records processed manually from the queue.
4. A travel and transportation data processing system configured for data deduplication, the system comprising: a host computer with memory and at least one processor; a fixed storage medium hosting a database comprising a multiplicity of multi-field records; a similarity model disposed in the memory, the similarity model correlating values for different fields of the multiplicity of multi-field records with a single person and producing a probability of duplication responsive to the submission by correlating a companion passenger with a specified individual so as to indicate a probability of duplication whenever the companion passenger appears in a pair of records submitted to the model for comparison; and, a data deduplication module comprising computer program instructions enabled upon execution in the memory of the host computer to perform: loading into the memory for comparison from the database, a pair of two different multi-field records; submitting the pair to the model; on condition that the similarity model produces a high probability, automatically merging the pair into a single record without manual intervention, but otherwise on condition that the similarity model produces a medium probability, placing the pair in a queue pending manual intervention and manual merging, but otherwise on condition that the similarity model produces a low probability, omitting the pair from consideration of merging; and, repeating the submitting, and one of the automatic merging, placing and omitting for each other pair of different multi-field records in the database.
5. The system of claim 4 , wherein the model is a machine learning model trained on different correlated pairs of records.
6. The system of claim 4 , wherein the program instructions during execution are enabled to further perform training the model by feeding back into the model, each pair of records processed manually from the queue.
7. A computer program product for data deduplication in a travel and transportation data processing system, the computer program product including a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a device to cause the device to perform: loading into memory for comparison from a database of a multiplicity of multi-field records, a pair of two different multi-field records; submitting the pair to a similarity model, the similarity model correlating values for different fields of the multiplicity of multi-field records with a single person and producing a probability of duplication responsive to the submission by correlating a companion passenger with a specified individual so as to indicate a probability of duplication whenever the companion passenger appears in a pair of records submitted to the model for comparison; on condition that the similarity model produces a high probability, automatically merging the pair into a single record without manual intervention, but otherwise on condition that the similarity model produces a medium probability, placing the pair in a queue pending manual intervention and manual merging, but otherwise on condition that the similarity model produces a low probability, omitting the pair from consideration of merging; and, repeating the submitting, and one of the automatic merging, placing and omitting for each other pair of different multi-field records in the database.
8. The computer program product of claim 7 , wherein the model is a machine learning model trained on different correlated pairs of records.
9. The computer program product of claim 7 , wherein the program instructions executable by the device cause the device to further perform training the model by feeding back into the model, each pair of records processed manually from the queue.
Unknown
June 15, 2021
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.