Methods for automatically generating a wrapper for extracting web data and corresponding computer systems are disclosed. In one arrangement, a first wrapper is used to generate a second wrapper. The first wrapper extracts target data from one or more target web pages hosted by one or more target web servers. The second wrapper is capable of extracting the same target data from the same one or more target web pages without using a web browser engine to perform a) sending requests to the one or more target web servers, and/or b) processing replies from the one or more target web servers. The generation of the second wrapper comprises analysing one or both of the following: (i) code defining the first wrapper, (ii) interactions between the first wrapper and the one or more target web servers that occur during execution of the first wrapper.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A method for automatically transforming a wrapper for extracting web data, comprising: transforming a first wrapper into a second wrapper, wherein: the first wrapper is configured to extract target data from one or more target web pages hosted by one or more target web servers; the second wrapper is configured to extract the same target data from the same one or more target web pages without using a web browser engine to perform a) sending requests to the one or more target web servers, and/or b) processing replies from the one or more target web servers; and the transforming comprises analyzing one or both of the following: (i) code defining the first wrapper, (ii) interactions between the first wrapper and the one or more target web servers that occur during execution of the first wrapper.
This invention relates to web data extraction, specifically transforming a wrapper used to extract data from web pages into a more efficient version that does not rely on a web browser engine. The problem addressed is the inefficiency of traditional wrappers that depend on browser engines to send requests and process responses, which consumes significant computational resources and slows down data extraction processes. The method involves converting a first wrapper, designed to extract specific data from target web pages hosted by one or more servers, into a second wrapper that retrieves the same data without using a browser engine. The second wrapper eliminates the need for browser-based operations such as sending HTTP requests or processing server replies, improving performance and reducing resource usage. The transformation process analyzes either the code defining the first wrapper or the interactions between the first wrapper and the target servers during execution. By examining these elements, the method identifies dependencies and optimizations that allow the second wrapper to function independently of a browser engine while maintaining the same data extraction capabilities. This approach enhances efficiency in web scraping and data extraction tasks, particularly in large-scale or automated systems.
2. The method of claim 1 , wherein the first wrapper is configured to extract the target data using a web browser engine.
A system and method for extracting target data from a web page involves using a first wrapper configured to extract the target data by leveraging a web browser engine. The first wrapper operates by loading the web page into the browser engine, which renders the page and processes any dynamic content, such as JavaScript, to ensure the target data is fully accessible. The wrapper then parses the rendered output to locate and extract the target data based on predefined criteria, such as HTML tags, class names, or other structural elements. This approach ensures accurate extraction of data that may not be available in the raw HTML source due to dynamic content generation. The extracted data is then processed further, such as by a second wrapper or an application, to convert it into a structured format for use in analytics, reporting, or other applications. The method is particularly useful for web scraping tasks where dynamic content must be rendered before extraction can occur, addressing challenges posed by modern web applications that rely heavily on client-side processing.
3. The method of claim 1 , wherein the first wrapper is configured to simulate, where necessary, user input to the one or more target web pages.
A system and method for automating interactions with web pages involves a wrapper that interfaces between a user and one or more target web pages. The wrapper is designed to simulate user input to the target web pages when necessary, ensuring seamless interaction even when direct user input is unavailable or impractical. This simulation capability allows the system to handle dynamic web content, form submissions, and other interactive elements without requiring manual user intervention. The wrapper may also process and transform data between the user and the target web pages, ensuring compatibility and proper formatting. The system is particularly useful in scenarios where automated scripts or bots need to interact with web applications, such as web scraping, automated testing, or data extraction tasks. By simulating user input, the system can navigate complex web interfaces, fill out forms, and trigger actions as if a real user were interacting with the page. This approach improves efficiency and reliability in automated web interactions, reducing the need for manual oversight.
4. The method of claim 1 , wherein the analysis of t interactions comprises identifying data transformations that the one or more target web servers apply to parameters sent to them by the first wrapper.
This invention relates to web application security testing, specifically analyzing interactions between a security testing system and target web servers to identify vulnerabilities. The method involves a first wrapper that intercepts and modifies requests sent to the target web servers, allowing for controlled testing of how the servers process input parameters. The analysis focuses on detecting data transformations applied by the target servers to these parameters, which can reveal security weaknesses such as improper input validation, encoding issues, or other vulnerabilities that could be exploited by attackers. By identifying these transformations, the system can determine whether the servers are properly sanitizing or validating input, which is critical for preventing injection attacks, cross-site scripting, and other common web application vulnerabilities. The method helps security testers understand how web applications handle and process data, enabling more effective vulnerability detection and mitigation. The approach is particularly useful for automated security testing tools that need to dynamically analyze server behavior without requiring prior knowledge of the application's internal workings.
5. The method of claim 1 , wherein the analysis of interactions comprises construction and use of a dependency graph comprising nodes and arcs, each node representing an interaction or set of interactions and each arc representing propagation of parameters from one interaction or set of interactions to another interaction or set of interactions.
This invention relates to analyzing interactions within a system, particularly for identifying dependencies and parameter propagation between interactions. The method involves constructing and using a dependency graph to model these relationships. The graph consists of nodes and arcs, where each node represents an individual interaction or a group of interactions, and each arc represents the flow of parameters from one interaction or set of interactions to another. This approach helps visualize and analyze how parameters are passed and dependencies are formed between different interactions in the system. The dependency graph can be used to identify critical paths, potential bottlenecks, or areas where changes in one interaction may impact others. This method is useful in software systems, network protocols, or any domain where interactions between components or processes need to be analyzed for dependencies and parameter flow. The graph structure allows for systematic analysis, optimization, and debugging of complex interaction patterns.
6. The method of claim 5 , wherein the dependency graph is constructed by first including the interactions conveying the target data and then iteratively including other interactions which convey parameters necessary for already included interactions.
This invention relates to constructing dependency graphs for data processing systems, particularly for analyzing interactions that convey target data and their dependencies. The problem addressed is efficiently mapping out the relationships between data interactions in a system to understand how target data is derived or processed, including all necessary parameters and dependencies. The method constructs a dependency graph by first identifying and including interactions that directly convey the target data. Then, it iteratively adds other interactions that provide parameters required for the already included interactions. This iterative process continues until all necessary dependencies are accounted for, ensuring the graph fully represents the data flow and relationships within the system. The approach helps in analyzing, debugging, or optimizing data processing workflows by clearly visualizing how data and parameters propagate through the system. The method is particularly useful in complex systems where interactions are interdependent, ensuring no critical dependencies are overlooked.
7. The method of claim 5 , wherein the dependency graph consists exclusively of nodes corresponding to interactions conveying the target data and nodes and arcs necessary for providing parameters for the interactions conveying the target data.
This invention relates to data processing systems that use dependency graphs to represent and analyze data interactions. The problem addressed is the complexity and inefficiency of traditional dependency graphs, which often include unnecessary nodes and arcs, leading to increased computational overhead and reduced clarity in data analysis. The invention provides a method for constructing a dependency graph that is optimized for efficiency and clarity. The dependency graph consists exclusively of nodes representing interactions that convey target data and additional nodes and arcs that are strictly necessary for providing parameters required by those interactions. This ensures that the graph remains focused on the essential elements needed for data processing, eliminating redundant or extraneous components. The method involves identifying the target data interactions and determining the minimal set of nodes and arcs required to support those interactions. This includes nodes representing data sources, transformations, and dependencies, as well as arcs defining the relationships between them. By restricting the graph to only these essential elements, the system reduces computational complexity and improves performance in data analysis tasks. This approach is particularly useful in systems where data processing efficiency is critical, such as real-time analytics, large-scale data processing, or applications requiring low-latency responses. The optimized dependency graph simplifies the representation of data flows, making it easier to analyze, debug, and maintain.
8. The method of claim 5 , further comprising constructing one or snore data-selectors configured to extract selected data from replies sent by the one or more target servers, wherein the selected data comprises one or more of the following: a portion of the target data, all of the target data, and one or more parameters required by one or more of the interactions.
This invention relates to systems for interacting with target servers to retrieve specific data. The problem addressed is the need to efficiently extract and process relevant data from server responses, particularly in scenarios where only portions of the data or specific parameters are required for subsequent interactions. The method involves constructing one or more data-selectors designed to extract selected data from replies sent by target servers. These data-selectors can be configured to retrieve either a portion of the target data, all of the target data, or specific parameters needed for further interactions. The data-selectors ensure that only the necessary information is extracted, improving efficiency and reducing unnecessary processing. This approach is particularly useful in automated systems where responses from multiple servers must be parsed and processed in real-time. The method may also involve generating requests to the target servers and receiving replies, which are then analyzed to determine the presence of the target data. The data-selectors can be dynamically adjusted based on the requirements of the interactions, allowing for flexible and adaptive data extraction. This invention enhances the precision and speed of data retrieval in server interactions, making it suitable for applications such as web scraping, API integrations, and automated data processing systems.
9. The method of claim 8 , wherein the data-selectors are synthetized via generalization from examples.
The invention relates to a method for generating data-selectors through a generalization process using examples. Data-selectors are mechanisms that filter, categorize, or extract specific data from larger datasets. The problem addressed is the need for efficient and accurate data-selectors that can adapt to different data structures without requiring manual programming or extensive rule-setting. The method involves training a system to recognize patterns in example data, then applying those patterns to new, unseen data. The generalization process allows the system to create data-selectors that can handle variations in data while maintaining accuracy. This approach reduces the time and effort required to develop custom data-selectors for different applications, such as data mining, natural language processing, or database management. The method may include preprocessing the example data to highlight relevant features, applying machine learning techniques to identify patterns, and refining the selectors based on feedback. The resulting data-selectors can be deployed in various systems to automate data extraction, classification, or filtering tasks. This technique improves efficiency and scalability in data processing workflows.
10. The method of claim 8 , wherein each data-selector comprises one or more of the following: (i) regular expressions, (ii) generalized regular expressions, (iii) XPath expressions, (iv) XPath-like expressions that apply to HTML, (v) expressions in some appropriate tree-grammar, tree automata, finite-state automata, procedures, and programs.
This invention relates to data extraction and processing systems, specifically methods for selecting and extracting data from structured or semi-structured data sources. The problem addressed is the need for flexible and efficient mechanisms to identify and retrieve specific data elements from diverse data formats, such as text, HTML, or XML, using various pattern-matching techniques. The method involves using data-selectors to identify and extract data from a data source. Each data-selector is configured to apply one or more of several pattern-matching techniques, including regular expressions, generalized regular expressions, XPath expressions, XPath-like expressions for HTML, and expressions based on tree grammars, tree automata, finite-state automata, or procedural logic. These selectors allow the system to parse and extract data from structured or semi-structured content by defining rules that match specific patterns or hierarchical structures within the data. The approach enables precise data extraction by leveraging different pattern-matching methods, depending on the data format and the complexity of the extraction task. For example, regular expressions may be used for simple text patterns, while XPath expressions are applied to navigate and extract data from XML or HTML documents. The use of tree-based grammars or automata allows for more complex hierarchical data processing, ensuring accurate extraction from nested or deeply structured data. This method enhances data processing efficiency by providing a versatile set of tools for data selection, making it adaptable to various data formats and extraction requirements. The flexibility of the selectors ensures compatibility with different data structures, improving the system's ability to handle diverse data s
11. The method of claim 8 , wherein the data-selectors are used to propagate data according to the arcs of the dependency graph.
A system and method for data processing involves a dependency graph representing relationships between data elements, where data-selectors are used to propagate data along the arcs of the graph. The dependency graph defines dependencies between data elements, with nodes representing data elements and arcs representing dependencies. Data-selectors are configured to retrieve, transform, or transmit data based on the graph's structure. The method includes constructing the dependency graph, identifying data-selectors associated with nodes or arcs, and using these selectors to propagate data according to the defined dependencies. The system ensures efficient data flow by dynamically adjusting data-selectors based on changes in the graph or external inputs. This approach optimizes data processing in distributed systems, ensuring consistency and reducing redundant computations by leveraging the graph's dependency structure. The method is particularly useful in applications requiring real-time data synchronization, such as distributed databases, collaborative editing systems, or event-driven architectures. The system may also include mechanisms to validate data integrity and handle conflicts during propagation.
12. The method of claim 1 , wherein the analysis of the interactions uses recorded traces representing sequences of messages exchanged between the first wrapper and the one or more target web servers.
This invention relates to analyzing interactions between software components in a distributed system, particularly focusing on message exchanges between a wrapper and target web servers. The problem addressed is the need to monitor and analyze communication patterns in distributed systems to ensure proper functionality, detect anomalies, or optimize performance. The method involves recording traces of message sequences exchanged between a wrapper and one or more target web servers. These traces capture the details of interactions, including the timing, content, and sequence of messages. The recorded traces are then analyzed to identify patterns, anomalies, or performance bottlenecks in the communication. The analysis may involve comparing the recorded traces against expected behavior, detecting deviations, or measuring latency and throughput. The wrapper acts as an intermediary between client applications and the target web servers, managing requests and responses. It may modify, route, or log messages to facilitate the analysis. The target web servers are the endpoints that process the requests and return responses. The recorded traces provide a detailed log of all interactions, enabling comprehensive analysis of the system's behavior. This approach helps in debugging, performance tuning, and ensuring reliability in distributed systems by providing visibility into the communication flow between components. The analysis of recorded traces allows for proactive identification of issues before they impact system performance or user experience.
13. The method of claim 12 , wherein multiple traces are obtained that correspond to multiple executions of the first wrapper, each execution being performed with different input data to the first wrapper.
This invention relates to a method for analyzing software execution traces to improve debugging or performance optimization. The method involves obtaining multiple execution traces from a software wrapper, where each trace corresponds to a different execution of the wrapper with varying input data. The wrapper is designed to intercept and log interactions between a target software component and its environment, such as function calls, system calls, or data exchanges. By collecting traces from multiple executions with different inputs, the method enables the analysis of how the target software behaves under diverse conditions. This can help identify bugs, performance bottlenecks, or security vulnerabilities that may only manifest under specific input scenarios. The traces may include timestamps, call stack information, or other metadata to provide a detailed record of the software's execution flow. The method may also involve comparing traces to detect inconsistencies or anomalies that could indicate errors. The approach is particularly useful for testing and debugging complex software systems where behavior depends heavily on input data.
14. The method of claim 13 , further comprising grouping interactions from the multiple traces that satisfy an equivalence relation and generating from the group a generalized HTTP request for the group.
This invention relates to analyzing and processing HTTP interactions from multiple traces to identify and generalize common request patterns. The problem addressed is the difficulty in efficiently analyzing large volumes of HTTP traffic data to extract meaningful, reusable request templates. The method involves collecting multiple traces of HTTP interactions, where each trace contains one or more HTTP requests and responses. The system processes these traces to identify interactions that share similar characteristics, such as request methods, headers, or payload structures. These similar interactions are grouped based on an equivalence relation, which defines criteria for determining when two requests are functionally equivalent. Once grouped, the system generates a generalized HTTP request that represents the common structure of all requests in the group. This generalized request can be used for testing, debugging, or further analysis, reducing redundancy and improving efficiency in handling HTTP traffic data. The method may also include filtering or transforming the interactions before grouping to ensure accuracy and relevance. The approach helps streamline the analysis of HTTP interactions by abstracting repetitive patterns into generalized templates.
15. The method of claim 14 , wherein the equivalence relation requires HTTP requests in the interactions to comprise one or more of the following: the same parametrized URLs, names of parameters and structures of response bodies.
This invention relates to a method for analyzing interactions in a networked system, particularly for identifying equivalent interactions based on specific criteria. The method involves comparing interactions to determine if they meet an equivalence relation, which is defined by certain characteristics of HTTP requests. Specifically, the equivalence relation requires that the HTTP requests in the interactions share one or more of the following: the same parametrized URLs, the same names of parameters, or the same structures of response bodies. This allows for the grouping of interactions that are functionally similar, even if they differ in minor details. The method is useful for tasks such as identifying redundant interactions, optimizing network traffic, or improving system performance by recognizing and handling equivalent requests in a consistent manner. The approach ensures that interactions are compared based on their core functional aspects, rather than superficial differences, enabling more accurate analysis and processing.
16. The method of claim 13 , wherein similar interactions from the multiple traces are grouped together based on analysis of parameters of the interactions.
The invention relates to analyzing and grouping interactions from multiple traces in a system, such as network traffic or user activity logs. The problem addressed is the difficulty in identifying and categorizing similar interactions across different traces, which is essential for tasks like anomaly detection, performance optimization, or security monitoring. The method involves collecting traces containing interaction data, where each trace includes multiple interactions with associated parameters. These interactions are then analyzed to determine similarities based on their parameters, such as timestamps, source/destination identifiers, or payload characteristics. Similar interactions are grouped together to form clusters, enabling more efficient analysis and pattern recognition. The grouping process may involve statistical techniques, machine learning, or rule-based filtering to ensure accurate classification. This approach improves the ability to detect anomalies, optimize system performance, or identify security threats by reducing noise and highlighting meaningful patterns. The method can be applied in various domains, including cybersecurity, network management, and user behavior analysis.
17. The method of claim 16 , wherein parameters for each group of similar interactions are identified by analysis of differences in values of the parameters.
This invention relates to analyzing user interactions with a system to identify and group similar interactions, then refining those groups by analyzing parameter differences. The method involves collecting interaction data from users, where each interaction includes multiple parameters such as timing, input values, or system responses. The system first groups interactions that share common characteristics, such as similar input patterns or outcomes. After grouping, the system further refines these groups by analyzing variations in parameter values within each group. For example, if a group of interactions involves a specific user action, the system may identify subtle differences in timing or input values that distinguish subgroups. This refinement helps improve accuracy in categorizing interactions, enabling better personalization, fraud detection, or system optimization. The method may be applied in various domains, including user behavior analysis, recommendation systems, or security monitoring, where distinguishing nuanced differences in interactions is critical. By dynamically adjusting group parameters based on observed variations, the system adapts to evolving user patterns or system conditions.
18. The method of claim 17 , wherein the analysis of differences in values of the parameters can identify composite parameters of each group of similar interactions, where composite parameters are parameters consisting of other parameters.
This invention relates to analyzing interactions between entities, such as users or devices, to identify patterns and derive composite parameters from groups of similar interactions. The method involves collecting interaction data, where each interaction is characterized by multiple parameters. These parameters may include timing, frequency, duration, or other measurable attributes of the interactions. The method then groups interactions that exhibit similar characteristics, forming clusters of related interactions. By analyzing differences in the values of these parameters within each group, the method identifies composite parameters, which are derived from combinations of the original parameters. These composite parameters provide higher-level insights into the behavior or relationships within the groups. The approach enables more nuanced understanding of interaction patterns, which can be applied in fields such as user behavior analysis, network security, or system performance monitoring. The method improves upon existing techniques by dynamically generating composite parameters rather than relying on predefined metrics, allowing for more adaptive and context-aware analysis.
19. The method of claim 12 , wherein the recorded traces comprise sequences of HTTP request-reply pairs between the first wrapper and the one or more target web servers.
This invention relates to a system for monitoring and analyzing web server interactions. The problem addressed is the need to capture and analyze detailed communication between client applications and web servers to identify performance issues, security vulnerabilities, or other anomalies in web-based systems. The method involves recording traces of HTTP request-reply pairs exchanged between a first wrapper component and one or more target web servers. The wrapper acts as an intermediary, intercepting and logging these interactions. The recorded traces include sequences of HTTP requests sent by the wrapper to the web servers and the corresponding replies received in response. This allows for detailed analysis of the communication flow, including request headers, payloads, response codes, and timing data. The recorded traces can be used to detect patterns, anomalies, or performance bottlenecks in the web server interactions. The system may also include additional components for processing, storing, and visualizing the recorded data to assist in troubleshooting or optimizing web server performance. The method ensures that all relevant HTTP interactions are captured, providing a comprehensive view of the communication between the client and server components.
20. The method of claim 1 , wherein the analysis of the interactions includes identification of parameters of requests which can be removed without changing replies to the requests by more than a predetermined extent.
This invention relates to optimizing network communication by analyzing interactions between a client and a server to identify redundant or unnecessary parameters in requests. The problem addressed is inefficient data transmission, where requests may contain parameters that do not significantly impact the server's responses, leading to unnecessary bandwidth usage and processing overhead. The method involves monitoring interactions between a client and a server to detect patterns in request parameters. By analyzing these interactions, the system identifies parameters that can be removed or modified without altering the server's replies beyond a predefined threshold. This threshold ensures that the changes do not degrade functionality or accuracy. The analysis may involve statistical methods, machine learning, or rule-based approaches to determine which parameters are non-essential. Once identified, these parameters are flagged for removal or modification in future requests, improving efficiency without compromising performance. The invention may also include generating recommendations for parameter optimization, allowing developers to refine their applications. The system can adapt dynamically, learning from ongoing interactions to refine its parameter selection criteria. This approach is applicable to various networked systems, including web applications, APIs, and distributed computing environments, where reducing redundant data transmission is beneficial. The goal is to streamline communication while maintaining the integrity of the client-server interaction.
21. The method of claim 1 , wherein the analysis of the interactions comprises identifying requests originating from the first wrapper that are not necessary for extracting the target data and omitting those requests during transformation of the first wrapper into the second wrapper.
This invention relates to optimizing data extraction from web applications by analyzing and transforming wrapper code used to interact with web services. The problem addressed is inefficient data extraction due to unnecessary requests made by wrapper code, which can slow down performance and increase resource usage. The solution involves analyzing interactions between a first wrapper and a web service to identify and eliminate redundant or unnecessary requests while preserving those essential for extracting target data. The transformed wrapper, now optimized, retains only the necessary requests, improving efficiency. The method ensures that the transformed wrapper maintains the same functionality as the original but operates more efficiently by reducing unnecessary network traffic and processing overhead. This approach is particularly useful in automated data extraction systems where performance and resource management are critical. The invention focuses on dynamically analyzing request patterns, filtering out non-essential interactions, and generating an optimized wrapper that extracts the same data with fewer requests. This reduces latency and computational load, making the data extraction process faster and more scalable.
22. The method of claim 1 , wherein the analysis of the interactions comprises identifying data transformations that the first wrapper applies to replies and to parameters of the interactions.
This invention relates to analyzing interactions between software components, specifically focusing on data transformations applied by wrapper interfaces. The problem addressed is the lack of visibility into how data is modified as it passes through wrapper interfaces, which can complicate debugging, integration, and maintenance of distributed systems. The method involves monitoring interactions between a first software component and a second software component, where the first component is connected to the second component via a wrapper interface. The wrapper interface acts as an intermediary, potentially altering data before it reaches the second component. The analysis specifically identifies the data transformations applied by the wrapper to both incoming replies from the second component and outgoing parameters sent to the second component. This includes detecting modifications such as data type conversions, value mappings, or structural changes. The method may also track the sequence of transformations to reconstruct the full data flow through the wrapper. By identifying these transformations, developers can better understand how data is processed, ensuring consistency and correctness in system behavior. This is particularly useful in microservices architectures or legacy system integrations where wrapper interfaces are commonly used to bridge incompatible components.
23. The method of claim 22 , wherein the data transformation comprises transforming each reply to a parameter of another interaction or to the target data.
This invention relates to data transformation in interactive systems, particularly for processing user replies within a conversational interface or automated dialogue system. The problem addressed is efficiently converting user-provided responses into structured data that can be reused in subsequent interactions or integrated into a target dataset. The method involves analyzing a user's reply and applying a transformation to map the response into a specific parameter of another interaction or directly into a target data structure. This transformation ensures compatibility with predefined system requirements, enabling seamless integration of user input into automated workflows or data processing pipelines. The approach may include parsing natural language responses, extracting key information, and converting the extracted data into a standardized format. By dynamically adapting replies to different interaction contexts or target data schemas, the system enhances flexibility and interoperability in conversational applications. The transformation process may involve rule-based mapping, machine learning-based interpretation, or hybrid approaches to accurately interpret and repurpose user input. This method is particularly useful in applications like customer service bots, automated surveys, or data collection systems where user responses need to be systematically processed and reused.
24. The method of claim 22 , wherein the analysis of the interactions comprises analysis of JavaScript programs that the first wrapper executes.
This invention relates to analyzing interactions within a computing environment, particularly focusing on the execution of JavaScript programs by a wrapper component. The problem addressed involves monitoring and evaluating the behavior of JavaScript programs to ensure security, performance, or compliance requirements are met. The method involves analyzing interactions between different components, with a specific focus on the execution of JavaScript programs by a first wrapper. The analysis may include examining the JavaScript code for vulnerabilities, performance bottlenecks, or unauthorized operations. The wrapper acts as an intermediary that executes the JavaScript programs, allowing for controlled observation and intervention. This approach enables real-time or post-execution assessment of the JavaScript programs' behavior, ensuring they adhere to predefined policies or security standards. The method may also involve logging or reporting the results of the analysis for further review or automated remediation. By focusing on the interactions and execution of JavaScript programs, the invention provides a way to enhance security and operational integrity in environments where JavaScript is dynamically executed.
25. The method of claim 1 , comprising: executing the first wrapper at a client to extract the target data from the one or more target web pages, the first wrapper extracting the target data by simulating, where necessary, user input to the one or more target web pages, the simulated user input specifying the target data to be extracted; analyzing interactions between the first wrapper and the one or more target web servers that occurred during execution of the first wrapper; and using the analysis of the interactions to transform the first wrapper into the second wrapper.
This invention relates to web data extraction systems that adapt to dynamic web content. The problem addressed is the difficulty of maintaining accurate data extraction when target web pages change their structure or behavior, requiring frequent manual updates to extraction scripts. The solution involves a method for automatically transforming a first data extraction wrapper into a second, improved wrapper based on observed interactions with target web servers. The method begins by executing a first wrapper at a client to extract target data from one or more web pages. The wrapper simulates user input, such as clicks or form submissions, to navigate the pages and identify the target data. During execution, interactions between the wrapper and the web servers are analyzed, including network requests, responses, and timing. This analysis identifies patterns in how the wrapper interacts with the servers, such as repeated requests or failed extractions. The analysis results are then used to modify the wrapper, creating a second wrapper that improves extraction efficiency, accuracy, or robustness. The second wrapper may include changes to input simulation logic, data parsing rules, or error handling based on the observed interactions. This approach reduces manual maintenance by automatically adapting to changes in web page behavior.
26. The method of claim 25 , wherein the simulation of user input by the first wrapper comprises rendering a web page using a web browser engine.
A system and method for simulating user interactions with a web application involves generating synthetic user input to test or automate interactions with a web-based interface. The method includes using a first wrapper to simulate user input, where this simulation involves rendering a web page using a web browser engine. This allows the system to mimic real user behavior, such as clicking, typing, or navigating, within a controlled environment. The simulation may be used for testing web applications, automating repetitive tasks, or analyzing user experience. The system may also include a second wrapper to intercept and modify network traffic between the web application and a server, enabling testing of backend services or simulating different network conditions. The method ensures accurate simulation of user interactions by leveraging a real web browser engine, which provides realistic rendering and behavior compared to simpler automation tools. This approach is useful for quality assurance, performance testing, and security assessments of web applications.
27. The method of claim 26 , wherein the user input comprises one or more interactions that a user can make with a displayed web page.
A system and method for analyzing user interactions with a web page to improve user experience and functionality. The technology addresses the challenge of capturing and interpreting diverse user inputs to enhance web page responsiveness and adaptability. The method involves detecting and processing user interactions, such as clicks, scrolls, or gestures, with a displayed web page. These interactions are analyzed to determine user intent, preferences, or behavior patterns. The system then adjusts the web page's content, layout, or functionality in real-time based on the analysis. For example, the system may prioritize certain elements, modify navigation paths, or suggest personalized content. The method may also integrate with backend systems to refine recommendations or trigger automated actions. The goal is to create a more intuitive and efficient user experience by dynamically adapting the web page to individual user behavior. This approach improves engagement, reduces friction, and enhances accessibility for users with varying needs. The system may also log interaction data for further analysis, enabling continuous optimization of the web page's design and functionality.
28. The method of claim 1 , wherein the transforming comprises direct compilation of the first wrapper into the second wrapper.
A system and method for transforming software wrappers to enable interoperability between different software components. The problem addressed is the difficulty of integrating software components that use incompatible wrapper interfaces, requiring manual rewriting or complex adapter layers. The invention provides an automated transformation process that converts a first wrapper, designed for a first software component, into a second wrapper compatible with a second software component. The transformation includes direct compilation of the first wrapper into the second wrapper, eliminating the need for intermediate translation steps. This direct compilation ensures efficient and accurate conversion while maintaining the original functionality. The method may involve analyzing the first wrapper's structure, identifying equivalent functions or data structures in the second wrapper, and generating the second wrapper code directly from the first wrapper's source or intermediate representation. The transformation may also include handling differences in syntax, data types, or communication protocols between the wrappers. The invention improves software integration by reducing development time and minimizing errors associated with manual conversion.
29. A non transitory computer readable storage medium having recorded thereon program code that, when executed on a computer system, instructs the computer system to carry out the method of claim 1 .
A system and method for optimizing data processing in a distributed computing environment addresses inefficiencies in task allocation and resource utilization. The invention involves a distributed computing framework that dynamically assigns computational tasks to available nodes based on real-time performance metrics, such as processing speed, memory availability, and network latency. The system monitors the status of each node in the network, including their current workload and resource capacity, to ensure optimal task distribution. When a new task is submitted, the system evaluates the capabilities of all available nodes and selects the most suitable one for execution, balancing load across the network to prevent bottlenecks. Additionally, the system includes a fault-tolerant mechanism that detects node failures and automatically reallocates tasks to alternative nodes, minimizing downtime. The method also incorporates predictive analytics to anticipate future resource demands and preemptively adjust task assignments, improving overall system efficiency. The invention is particularly useful in large-scale data processing applications, such as cloud computing and big data analytics, where efficient resource management is critical. The program code for implementing this method is stored on a non-transitory computer-readable medium, ensuring portability and ease of deployment across different computing environments.
30. A computer system configured to automatically transform a wrapper for extracting web data, the computer system being configured to perform the following steps: transforming a first wrapper into a second wrapper, wherein: the first wrapper is configured to extract target data from one or more target web pages hosted by one or more target web servers; the second wrapper is configured to extract the same target data from the same one or more target web pages without using a web browser engine to perform a) sending requests to the one or more target web servers, and/or b) processing replies from the one or more target web servers; and the transforming comprises analyzing one or both of the following: (i) code defining the first wrapper, (ii) interactions between the first wrapper and the one or more target web servers that occur during execution of the first wrapper.
This invention relates to a computer system designed to automatically convert a web data extraction tool, called a wrapper, into a more efficient version. The original wrapper extracts specific data from target web pages hosted by web servers, typically using a web browser engine to send requests and process responses. The system transforms this wrapper into a new version that extracts the same data from the same web pages but eliminates the need for a web browser engine. This means the new wrapper does not rely on sending requests or processing replies through a browser, making the extraction process faster and more resource-efficient. The transformation process involves analyzing either the code that defines the original wrapper or the interactions between the wrapper and the web servers during execution. By understanding how the wrapper operates, the system can generate a streamlined version that achieves the same data extraction without the overhead of browser-based operations. This approach improves performance and reduces computational costs associated with web scraping tasks.
31. A method for automatically generating a wrapper for extracting web data, comprising: executing a first wrapper to extract target data from one or more target web pages hosted by one or more target web servers; analyzing one or both of the following: (i) code defining the first wrapper, and (ii) interactions between the first wrapper and the one or more target web servers that occur during the execution of the first wrapper; and using the analysis to generate an output wrapper that, when executed, extracts the same target data from the same one or more target web pages without using a web browser engine to perform a) sending requests to the one or more target web servers, and/or b) processing replies from the one or more target web servers.
This invention relates to web data extraction, specifically automating the generation of wrappers that extract structured data from web pages without relying on a web browser engine. The problem addressed is the inefficiency and complexity of manually creating or maintaining wrappers for web scraping, particularly when web page structures change frequently. Traditional methods often require browser-based tools to simulate user interactions, which are resource-intensive and slow. The method involves executing an initial wrapper to extract target data from web pages hosted by servers. During this process, the system analyzes either the wrapper's code or the interactions between the wrapper and the servers (or both). This analysis is then used to generate an improved wrapper that can extract the same data from the same web pages. The key innovation is that the output wrapper operates without a web browser engine, meaning it does not need to send HTTP requests or process server responses through a browser. This reduces computational overhead and improves efficiency. The approach allows for automated adaptation to changes in web page structures, making the extraction process more robust and scalable. The method is particularly useful for large-scale data extraction tasks where performance and reliability are critical.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
July 12, 2018
March 22, 2022
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.