Discussed herein are methods and systems to train customized machine learning models in a more efficient manner (e.g., using fewer labeled data points). In one example, a method may include using a first machine learning to generate likelihoods of fraudulent activity for an aggregated series of data associated with a series of computing systems. Based on the calculated likelihoods, a server can generate a training dataset that includes fraudulent data associated with a first computing system, fraudulent data associated with any other computing system within the series of computing systems other than the first computing system, non-fraudulent data associated with the first computing system, and non-fraudulent data associated with any other computing system within the series of computing systems other than the first computing system. The server may then train a second machine learning model using the training data, e.g., using a contrastive learning method.
Legal claims defining the scope of protection, as filed with the USPTO.
. A method of accelerating training time for training a first machine learning model using a training dataset generated via a second machine learning model, the method comprising:
. The method of, wherein the first machine learning model is trained using a quadruplet training technique.
. The method of, further comprising:
. The method of, wherein at least one network operation within the training dataset includes a lineage labeling attribute.
. The method of, further comprising:
. The method of, wherein the new network operation is a pending network operation for an amount less than a price threshold.
. The method of, wherein the first, second, third, and fourth subsets of the set of network operation data are selected further based on a similar attribute with the new network operation.
. The method of, wherein the first machine learning model is trained using an unsupervised method and without using any labeling data associated with the training dataset.
. A system for accelerating training time for machine learning models, the system comprising:
. The system of, wherein the first machine learning model is trained using a quadruplet training technique.
. The system of, wherein the one or more processors are further configured to:
. The system of, wherein at least one network operation within the training dataset includes a lineage labeling attribute.
. The system of, wherein the one or more processors are further configured to:
. The system of, wherein the new network operation is a pending network operation for an amount less than a price threshold.
. The system of, wherein the first, second, third, and fourth subsets of the set of network operation data are selected further based on a similar attribute with the new network operation.
. The system of, wherein the first machine learning model is trained using an unsupervised method and without using any labeling data associated with the training dataset.
. A non-transitory machine-readable storage medium for accelerating training time for machine learning models, the storage medium having computer-executable instructions stored thereon that, when executed by one or more processors, cause the one or more processors to:
. The non-transitory machine-readable storage medium of, wherein the first machine learning model is trained using a quadruplet training technique.
. The non-transitory machine-readable storage medium of, wherein the computer-executable instructions further cause the one or more processors to:
. The non-transitory machine-readable storage medium of, wherein the computer-executable instructions further cause the one or more processors to:
Complete technical specification and implementation details from the patent document.
This application relates generally to methods and systems for customized training of machine learning models using contrastive learning techniques to predict fraudulent electronic transactions.
The rapid proliferation of advanced computing and digital technologies has led to the development of complex systems that involve interactions between various machine learning models, applications, and data sources. Such systems can be utilized in the field of fraud detection, particularly in monitoring electronic transactions. For example, a system may use a machine learning model to analyze transactions across different electronic platforms and computing systems. The machine learning models may be trained to identify patterns indicative of fraudulent activity by evaluating various transaction attributes such as amount, location, and timing. Given the evolving tactics of fraud and the limited adaptability of machine learning approaches, machine learning models may not always directly detect new or sophisticated fraud schemes. Accordingly, the system may incorporate additional data sources or analytical methods to improve the detection capabilities of the machine learning models.
In order to identify fraudulent activity, many existing computing infrastructures use machine learning models. However, conventional machine learning models trained using conventional methods face numerous technical challenges. Systems and methods using machine learning models for detecting fraudulent transactions rely on pre-defined workflows to access data from a single electronic data source and generate reports on potentially fraudulent activity within computing systems. These machine-learning models lack flexibility and may not adapt well to evolving fraud patterns or specific user needs. For example, a static workflow may not dynamically select the most relevant data sources or features based on the specific characteristics of a suspicious transaction. This approach, while providing a general assessment, fails to determine the complexities and evolving patterns of such attacks. For instance, by focusing solely on individual transactions, the machine learning models miss contextual insights that can be gleaned from analyzing sequences of transactions associated with a specific computing system. This limitation can result in inefficiencies as the machine learning models may not be optimized to gather the most relevant information for every potential fraud scenario.
Additionally, the challenges of using supervised learning techniques to train machine learning models can be exacerbated due to the difficulty of acquiring labeled data. For example, unlike supervised learning tasks where labeled examples are readily available, identifying instances of fraudulent transactions may lack explicit labels signifying fraud. This makes it challenging to build a sufficiently large and diverse dataset with accurately labeled fraudulent transaction instances. Additionally, the dynamic and evolving nature of fraud patterns may require constant updates to the labeled dataset, which can be time-consuming and resource-intensive. Moreover, the lack of labels for new or emerging fraud patterns further complicates the use of traditional supervised learning techniques. Therefore, using conventional supervised training methods faces technical challenges and is not desirable.
Furthermore, unsupervised learning techniques also encounter technical challenges when applied to detecting fraudulent activity due to the inherent nature of unsupervised learning algorithms. For example, unlike supervised learning techniques, where labeled examples guide the machine learning model, unsupervised learning techniques rely on the inherent structure of the input data to identify patterns and anomalies. In the case of fraudulent transaction detection, the lack of labeled data makes it difficult for unsupervised learning algorithms to accurately differentiate between legitimate and fraudulent transactions. Without explicit labels, the unsupervised learning algorithms may struggle to distinguish between normal variations in transaction behavior and fraudulent patterns.
Moreover, unsupervised learning algorithms often require a vast amount of unlabeled data to effectively capture the underlying data structure. In the context of fraudulent transaction detection, acquiring a sufficiently large and diverse dataset of unlabeled transactions can be challenging, especially considering the dynamic nature of fraud patterns. Additionally, unsupervised learning algorithms may struggle to generalize well to unseen fraudulent patterns. Since unsupervised learning algorithms rely solely on the input data's distribution to identify anomalies, they may not adapt well to new or evolving fraud techniques. Therefore, it is desirable to have a more dynamic configuration that can perform fraudulent transaction detection using contrastive learning sequence models.
The technical solutions described herein can incorporate a sequence-based contrastive learning model within a fraudulent transaction detection system to dynamically manage and process transaction data across multiple computing systems. In this regard, the fraudulent transaction detection system can integrate multiple computer models, where a first computer model can be trained on transaction data collected from various computing systems. The training of the first computer model can help identify fraud patterns without relying on labeled data indicating fraudulence. By using contrastive learning techniques (such as Triplet Loss, Quadruplet Loss, or Info NCE loss functions), the first computer model can be trained to identify potential fraud patterns based on positive and negative examples. The technical solution can enable the first computer model to differentiate between legitimate and fraudulent patterns based on the sequence and context of transactions. Using the methods and systems discussed herein, a machine-learning model can be trained using a smaller dataset than required by conventional methods. Therefore, the methods and systems discussed herein allow for more efficient training of machine learning models (e.g., using less data and training the model using less computing resources and in less time).
Additionally, a second computer model (e.g., an existing fraud model) can be used to further enhance the training process of the first computer model. The second computer model can be trained on a separate, smaller set of transactions. This information can then be fed to the first computer model for training purposes. For example, the second computer model can prioritize informative transaction sequences for the training of the first computer model. By focusing on sequences exhibiting characteristics commonly associated with fraudulent and legitimate transactions, the first computer model can improve its learning process and become more adept at identifying fraudulent patterns. Therefore, the methods and systems discussed herein can use one machine learning model (e.g., an existing fraud model) to train a new model. This allows for computational efficiency as the existing model may already be trained and does not need to be revised. Moreover, the methods and systems discussed herein allow for retrofitting existing computing infrastructure without the need to revise existing models, which is highly undesirable.
A machine learning model trained using the methods and systems discussed herein can be trained and customized to detect a specific type of fraud, as opposed to generic or conventional fraud. As fraudsters improve their knowledge and skill in cyber security attacks, they invent new fraud schemes that are hard to detect using conventional fraud models. For instance, many fraudsters use card testing schemes. Card testing is an emerging type of fraud where fraudsters use stolen or randomly generated credit card numbers to make small transactions, typically for a very low amount (e.g., a dollar), to verify if the card is active and can be used for larger transactions. If the small charge is successful, indicating that the card is valid, the fraudster will proceed to make larger fraudulent purchases. This process helps fraudsters identify working card numbers from a bulk list of stolen or generated ones. Card testing often involves automated scripts or bots that can quickly test large volumes of card numbers through online payment gateways.
These new fraud models are hard to detect using conventional fraud models because they are usually an amount that goes unnoticed by the user (hence, there will be fewer labeled data) and will usually go unreported. Moreover, small transactions can blend in with legitimate activity, leading to insufficient labeled data for training. Fraudsters also adapt their methods, varying transaction patterns to evade detection and high-volume, low-frequency testing further complicates identification. Traditional models, which rely on static rules and delayed data processing, struggle with these subtle and sporadic patterns, especially across different computing systems.
Using the methods and systems discussed herein, the machine learning model can be trained specifically for card testing activity. Therefore, “fraud,” as used herein, may refer to card testing activity and not traditional fraudulent activity (e.g., overcharging a user).
In some embodiments, a method may include executing, by a processor, the second machine learning model using a set of transaction data to predict a likelihood indicating whether each transaction within the set of transaction data is fraudulent; adding, by the processor, a first subset and a second subset of the set of transaction data to the training dataset, wherein each transaction within the first subset and the second subset of the set of transaction data has a likelihood of being fraudulent by satisfying a first threshold, wherein each transaction within the first subset is associated with a first computing system of a plurality of computing systems, each transaction within the second subset is associated with any computing system of the plurality of computing systems other than the first computing system, and the first subset and the second subset of the set of transactions are not labeled for training; adding, by the processor, a third subset and a fourth subset of the set of transaction data to the training dataset, wherein each transaction within the third subset and the fourth subset of the set of transaction data has a likelihood of being fraudulent by satisfying a second threshold, wherein each transaction within the third subset is associated with the first computing system of the plurality of computing systems, each transaction within the fourth subset is associated with any computing system of the plurality of computing systems other than the first computing system and the third subset and the fourth subset of the set of transactions are not labeled for training; and training, by the processor, the first machine learning model using the training dataset, such that the first machine learning model is configured to receive a new transaction and predict a likelihood of the new transaction being fraudulent.
The first machine learning model may be trained using a quadruplet training technique. The method may include adding, by the processor to the training dataset, a fifth subset of the set of transaction data that includes a label indicating whether any transaction within the fifth subset of the set of transaction data is fraudulent. At least one transaction within the training dataset may include a lineage labeling attribute. The method may include eliminating, by the processor from the training dataset, any transaction data with a confidence score that does not satisfy a third threshold. The new transaction may be a pending transaction for an amount less than a price threshold. The first, second, third, and fourth subsets of the set of transaction data may be selected further based on a similar attribute with the new transaction. The first machine learning model may be trained using an unsupervised method and without using any labeling data associated with the training dataset.
In some embodiments, a system may comprise one or more processors configured to cause a second machine learning model, using a set of transaction data, to predict a likelihood indicating whether each transaction within the set of transaction data is fraudulent; add a first subset and a second subset of the set of transaction data to the training dataset, wherein each transaction within the first subset and the second subset of the set of transaction data has a likelihood of being fraudulent by satisfying a first threshold, wherein each transaction within the first subset is associated with a first computing system of a plurality of computing systems, each transaction within the second subset is associated with any computing system of the plurality of computing systems other than the first computing system, and the first subset and the second subset of the set of transactions are not labeled for training; add a third subset and a fourth subset of the set of transaction data to the training dataset, wherein each transaction within the third subset and the fourth subset of the set of transaction data has a likelihood of being fraudulent by satisfying a second threshold, wherein each transaction within the third subset is associated with the first computing system of the plurality of computing systems, each transaction within the fourth subset is associated with any computing system of the plurality of computing systems other than the first computing system and the third subset and the fourth subset of the set of transactions are not labeled for training; and train the first machine learning model using the training dataset, such that the first machine learning model is configured to receive a new transaction and predict a likelihood of the new transaction being fraudulent.
The first machine learning model may be trained using a quadruplet training technique. The one or more processors may be further configured to add, to the training dataset, a fifth subset of the set of transaction data that includes a label indicating whether any transaction within the fifth subset of the set of transaction data is fraudulent. At least one transaction within the training dataset may include a lineage labeling attribute. The one or more processors may be further configured to eliminate, from the training dataset, any transaction data with a confidence score that does not satisfy a third threshold. The new transaction may be a pending transaction for an amount less than a price threshold. The first, second, third, and fourth subsets of the set of transaction data may be selected further based on a similar attribute with the new transaction. The first machine learning model may be trained using an unsupervised method and without using any labeling data associated with the training dataset.
In yet another embodiment, a non-transitory machine-readable storage medium having computer-executable instructions stored thereon that, when executed by one or more processors, cause the one or more processors to cause a second machine learning model, using a set of transaction data, to predict a likelihood indicating whether each transaction within the set of transaction data is fraudulent; add a first subset and a second subset of the set of transaction data to the training dataset, wherein each transaction within the first subset and the second subset of the set of transaction data has a likelihood of being fraudulent by satisfying a first threshold, wherein each transaction within the first subset is associated with a first computing system of a plurality of computing systems, each transaction within the second subset is associated with any computing system of the plurality of computing systems other than the first computing system, and the first subset and the second subset of the set of transactions are not labeled for training; add a third subset and a fourth subset of the set of transaction data to the training dataset, wherein each transaction within the third subset and the fourth subset of the set of transaction data has a likelihood of being fraudulent by satisfying a second threshold, wherein each transaction within the third subset is associated with the first computing system of the plurality of computing systems, each transaction within the fourth subset is associated with any computing system of the plurality of computing systems other than the first computing system and the third subset and the fourth subset of the set of transactions are not labeled for training; and train the first machine learning model using the training dataset, such that the first machine learning model is configured to receive a new transaction and predict a likelihood of the new transaction being fraudulent.
The first machine learning model may be trained using a quadruplet training technique. The computer-executable instructions may further cause the one or more processors to add, to the training dataset, a fifth subset of the set of transaction data that includes a label indicating whether any transaction within the fifth subset of the set of transaction data is fraudulent. The computer-executable instructions may further cause the one or more processors to eliminate, from the training dataset, any transaction data with a confidence score that does not satisfy a third threshold.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.
Reference will now be made to the illustrative embodiments illustrated in the drawings, and specific language will be used here to describe the same. It will nevertheless be understood that no limitation of the scope of the claims or this disclosure is thereby intended. Alterations and further modifications of the inventive features illustrated herein, and additional applications of the principles of the subject matter illustrated herein, which would occur to one ordinarily skilled in the relevant art and having possession of this disclosure, are to be considered within the scope of the subject matter disclosed herein. The present disclosure is here described in detail with reference to embodiments illustrated in the drawings, which form a part here. Other embodiments may be used and/or other changes may be made without departing from the spirit or scope of the present disclosure. The illustrative embodiments described in the detailed description are not meant to be limiting of the subject matter presented here.
is a non-limiting example of components of a fraudulent transaction detection systemin which an analytics serveroperates. The analytics servermay utilize features described into process transaction data and predict the likelihood of a transaction being fraudulent.
The analytics servermay be communicatively coupled to a system database, user devices-(collectively user devices), and an administrator computing device. The analytics servermay also use various computer models (e.g., the computer models-) to analyze the data. The computer model-can include one or more machine learning models. For example, a first computing modelcan include a first machine learning model that can be trained using the data analyzed via the second computing model
In some embodiments, for convenience, the first computing modelcan be referred to as the first machine learning model, and the second computing modelcan be referred to as the second machine learning model. Moreover, even though the computer modelis depicted as a single model, it can be a collection of models itself.
The systemis not confined to the components described herein and may include additional or other components not shown for brevity, which are to be considered within the scope of the embodiments described herein. The systemmay also include other servers (not depicted), which serve to conduct allowance or blocking of future transactions responsive to predictions generated by the first computer modelusing the training dataset generated by the second computer model
The above-mentioned components may be connected to each other through a network. The examples of the networkmay include, but are not limited to, private or public LAN, WLAN, MAN, WAN, and the Internet. The networkmay include both wired and wireless communications according to one or more standards and/or via one or more transport mediums.
The communication over the networkmay be performed in accordance with various communication protocols such as Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), and IEEE communication protocols. In one example, the networkmay include wireless communications according to Bluetooth specification sets or another standard or proprietary wireless communication protocol. In another example, the networkmay also include communications over a cellular network, including, e.g., a GSM (Global System for Mobile Communications), CDMA (Code Division Multiple Access), and/or EDGE (Enhanced Data for Global Evolution) network.
The analytics servermay be configured to receive data (e.g., data associated with a transaction) from various sources and process the associated transaction data using a machine-learning model (e.g., the first computer model) to predict the likelihood of fraud. The analytics servermay receive the data directly from a user (e.g., the user subscribed to the subscription service performing the transaction), an entity (e.g., a bank, credit card company, or credit bureau, among others) result, or from another processor (not shown) associated with an electronic payment system. In some embodiments, a user or a computing system (e.g., a merchant) and/or a system administrator (operating the administrator computing device) may use a platform (hosted by the analytics serveror a third party) to transmit the request to the analytics server. The platform may include one or more graphical user interfaces (GUIs) displayed on the user deviceand/or the administrator computing device. For instance, the platform may include various GUIs that depict trends and statistical information regarding different computing systems and their respective fraudulent activities. For instance, the GUI may depict each merchant's number and trends associated with fraudulent activities.
An example of the platform generated and hosted by the analytics servermay be a web-based application or a website configured to be displayed on various electronic devices, such as mobile devices, tablets, personal computers, and the like. The platform may include various input elements configured to receive requests related to the transaction or the subscription service. For instance, a user may access the platform to initiate a transaction. Using the platform, the user may select the transaction to be processed and may provide a means of payment for the transaction.
The analytics servermay be any computing device comprising a processor and non-transitory, machine-readable storage capable of executing the various tasks and processes described herein. The analytics servermay employ various processors, such as a central processing unit (CPU) and graphics processing unit (GPU), among others. Non-limiting examples of such computing devices may include workstation computers, laptop computers, server computers, and the like. While the systemincludes a single analytics server, the analytics servermay include any number of computing devices operating in a distributed computing environment, such as a cloud environment.
The computer modelsmay represent a collection of various machine learning models or computer models that use algorithmic and/or artificial intelligence modeling techniques to process different transactions and, predict the likelihood of fraud, and train each other. In some embodiments, different computer models may be configured to determine different scores or thresholds using different methods and/or may be trained differently. For instance, the computer modelmay be trained and calibrated to predict a transaction corresponding to a likelihood of being fraudulent (e.g., card testing), and the computer modelmay be calibrated to determine a first score threshold corresponding to a score exceeding a predefined threshold or score.
In some embodiments, the computer modelcan include one or more models, including, but not limited to, a real-time card testing day model (RTCTDM), a card testing day model (CTDM), a validation to payment (VTP), a decline model (DM), and a card testing transaction level CTTX model, among others. The RTCTDM can be used to identify and predict card testing transactions in real-time. The CTDM can be used to determine whether a computing system (e.g., merchant) is going through card testing on a specific day. The VTP model can be used to predict whether a transaction will result in a charge within 35 days (or any other time window). The DM can predict transactions that should be blocked to prevent fraudulent activity. The CTTX model can identify card-testing transactions and predict whether a transaction is part of a fraudulent (e.g., card testing) attack.
In some embodiments, the second machine learning modelcan curate transaction data to be included within a training dataset by identifying relevant transaction data points and/or assigning scores or thresholds. The second machine learning model,, can include one or more of the aforementioned machine learning models (e.g., RTCTDM, CTDM, VTP, CTTX, and/or DM).
In some implementations, the first machine learning modelcan be trained using the data curated by the second machine learning modelto predict the likelihood of new transactions being fraudulent.
In some embodiments, a group of the computer models may belong to the same model. That is, in some embodiments, a single model may include various sub-models. Segmenting a single machine-learning model into different sub-models can be a powerful approach to tackling complex tasks, such as detecting fraud and determining metrics for the likelihood of a future transaction's success.
The electronic data sources used to generate the training dataset may be retrieved from various electronic data repositories (referred to herein as the electronic data sources). The electronic data sourcesmay include various merchant and electronic sources that store transaction data (including fraudulent transactions).
Computing systemsmay be any computing device comprising a processor and a non-transitory, machine-readable storage medium capable of performing the various tasks and processes described herein. Non-limiting examples of the computing systemare a workstation computer, Point of Sale system, laptop computer, phone, tablet computer, and server computer. During operation, various users may use the computing systemsto conduct a transaction. Even though referred to herein as “user” devices, these devices may be operated by any party associated with a transaction, such as a merchant. For instance, a tabletmay be used by a computing system used on behalf of a merchant to conduct a sale. In another example, the computing systemsmay include a point-of-sale terminal or a card reader.
The administrator computing devicemay represent a computing device operated by a system administrator. The administrator computing devicemay be configured to monitor various attributes generated by the analytics server(e.g., a suitable service provider or various analytic metrics (e.g., the scores or thresholds) determined during training of one or more machine-learning models and/or systems); monitor one or more computer modelsutilized by the analytics serverand/or user devices; review feedback; and/or oversee the electronic data sourcescommunicated with by the analytics server
In operation, the analytics servermay receive data associated with a future or new transaction, including a user identifier, a transaction amount, and a payment identifier. Using the methods discussed herein, the analytics servermay use the first machine learning modelto predict whether the new transaction is fraudulent. Based on the predictions, the analytics servermay train the first computer modelaccordingly. The analytics servermay, upon determining that the transaction is fraudulent, instruct a second server to reject the transaction.
illustrates a flow diagram of a processexecuted by a fraudulent transaction detection system. The processincludes operations-. However, other embodiments can include additional or alternative operations or can omit one or more operations altogether. The processis described as being executed by a fraudulent transaction detection system that is the same as, or similar to, the fraudulent transaction detection systemdescribed in. However, one or more operations of processcan also be executed by any number of computing devices operating in the distributed computing system described in. For instance, one or more computing devices (e.g., computing devices that can be the same as, or similar to, the analytics server) can perform some or all of the operations described inalone or in cooperation with one or more other computing devices of. Using the methods and systems described herein, such as the process, the fraudulent transaction detection systemcan identify fraudulent or legitimate transactions and can involve building sequential features from the transaction data, which are then analyzed by machine learning models for a determination of fraud likelihood.
At Step, the analytics server can retrieve a set of transaction data, which can come from various sources, such as a database of past transactions or a real-time feed of new transactions. Each transaction record can include details such as anonymized or hashed card details, computing system information including name and location, the transaction amount, date and time of the transaction, billing address, and other relevant details. In some implementations, the second machine learning model can be trained on a historical dataset, including label transactions (e.g., fraudulent and/or legitimate). The second machine learning model can identify fraudulent patterns within transaction data. In some implementations, the second machine learning model can process each transaction individually or in batches. The second machine learning model can process the transaction details, considering various factors that can be indicative of fraud, such as the transaction amount, location of the computing system, time of the transaction, billing address, and past transaction history associated with the card and computing system.
In some implementations, the second machine learning model can assign a score (e.g., a likelihood score) to each transaction. The likelihood score can represent the second machine learning model's prediction regarding the probability of the transaction being fraudulent. The scoring format can be a percentage value, a value on a specific scale, or any other meaningful measure where a higher score indicates a greater likelihood of fraud, according to the second machine learning model's assessment. In some implementations, the second machine learning model can generate a list of processed transactions, each with its corresponding likelihood score predicted by the second machine learning model.
At Step, the analytics server may add a first subset and a second subset of the set of transaction data to the training dataset, wherein each transaction within the first subset and the second subset of the set of transaction data has a likelihood of being fraudulent by satisfying a first threshold, wherein each transaction within the first subset is associated with a first computing system of a plurality of computing systems, each transaction within the second subset is associated with any computing system of the plurality of computing systems other than the first computing system, and the first subset and the second subset of the set of transactions are not labeled for training.
In some implementations, the first subset can be understood as a specific selection of transactions extracted from the larger dataset used in Step. In some implementations, the training dataset can serve as a collection of transaction data to train the first machine-learning model. In some implementations, the training dataset can serve as a collection of transaction data to train the second machine-learning model. Each record (e.g., individual transaction) within the first subset may not be labeled to indicate whether the transaction is fraudulent or legitimate. In some implementations, each record within the first subset may be labeled to indicate whether the transaction is fraudulent or legitimate.
In some implementations, each transaction within the first subset can indicate a likelihood of being fraudulent that satisfies a first threshold. For example, the score (likelihood score) of transactions that satisfy the first threshold exceeds a predefined threshold. The predefined threshold can be indicative of potential fraud. The first threshold can be a specific score (e.g., percentage or value on a scale). In some implementations, the first threshold can be predefined. In some implementations, the first threshold score can be generated by the second machine learning model. Additionally, each transaction within the first subset can be associated with a first computing system out of a plurality of computing systems. The first computing system can be interpreted as transactions that originated from the same computing system (e.g., merchant) in the context of card testing fraud.
Additionally, the analytics server may add a second subset of the set of transaction data to the training dataset. In some implementations, the second subset can be understood as a specific selection of a group of transactions for inclusion in the training data. Each record (e.g., individual transaction) within the second subset may not be labeled to indicate whether the transaction is fraudulent or legitimate. In some implementations, each record within the second subset may be labeled to indicate whether the transaction is fraudulent or legitimate.
The second machine learning model can select transactions for the second subset based on the likelihood of fraud. For example, in this instance, the score (likelihood score) can satisfy the first threshold, indicating that the score of transactions exceeds the predefined threshold. Each transaction within the second subset can be associated with a different computing system (merchant) compared to the first computing system. This means the transactions can originate from computing systems other than those used in the previous steps.
Effectively, in the step, the analytics server may add a first subset of the dataset analyzed in the stepthat includes fraudulent transactions associated with the merchant and a second subset of the dataset analyzed in the stepthat includes fraudulent transactions associated with different merchants.
At Step, the analytics server may add a third subset and a fourth subset of the set of transaction data to the training dataset, wherein each transaction within the third subset and the fourth subset of the set of transaction data has a likelihood of being fraudulent by satisfying a second threshold, wherein each transaction within the third subset is associated with the first computing system of the plurality of computing systems, each transaction within the fourth subset is associated with any computing system of the plurality of computing systems other than the first computing system and the third subset and the fourth subset of the set of transactions are not labeled for training.
In some implementations, the third subset can be understood as a specific selection of a group of transactions for inclusion in the training data. Each record (e.g., individual transaction) within the third subset may not be labeled to indicate whether the transaction is fraudulent or legitimate. In some implementations, each record within the third subset may be labeled to indicate whether the transaction is fraudulent or legitimate.
In some implementations, each transaction within the third subset can indicate a likelihood of being legitimate that satisfies a second threshold. For example, the score (likelihood score) of transactions that satisfy the second threshold falls below the predefined threshold. The second threshold can be a specific score (e.g., percentage or value on a scale). In some implementations, the second threshold can be predefined. In some implementations, the second threshold score can be generated by the second machine learning model. Additionally, each transaction within the third subset can be associated with the same computing system, meaning all transactions originated from the same computing system.
Additionally, the analytics server may add a fourth subset of the set of transaction data to the training dataset. In some implementations, the fourth subset can be understood as a specific selection of a group of transactions for inclusion in the training data. Each record (e.g., individual transaction) within the fourth subset may not be labeled to indicate whether the transaction is fraudulent or legitimate. In some implementations, each record within the fourth subset may be labeled to indicate whether the transaction is fraudulent or legitimate.
Unknown
December 4, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.