Legal claims defining the scope of protection, as filed with the USPTO.
1. A method, comprising: accessing a data file stored in memory of a computing device, the data file including a series of characters corresponding to a series of code points, each code point in the series of code points being encoded in memory using a variable-width encoding scheme, and each code point in the series of code points corresponding to one or more encoded bytes in memory; receiving a request to skip ahead a desired number of code points in the data file from a reference code point in the series of code points; initializing a counter indicating the desired number of code points to be skipped; initializing a pointer, the pointer pointing to a location in the memory corresponding to the reference code point; performing, for each iteration in one or more iterations, until the counter satisfies a threshold value: loading a multi-byte chunk of data in the memory, starting from the location of the pointer, into at least one register; determining a number of lead bytes in the multi-byte chunk by performing one or more operations on the multi-byte chunk; and decrementing the counter using the number of lead bytes and advancing the pointer to a different location in the memory based on a number of bytes in the multi-byte chunk; responsive to the counter satisfying the threshold value, individually processing at least one code point in the series of code points to advance the pointer until the counter satisfies a minimum value; and responsive to the counter satisfying the minimum value, outputting data corresponding to a destination code point in the series of code points, the destination code point being identified by a current position of the pointer.
2. The method of claim 1, wherein determining the number of lead bytes in the multi-byte chunk comprises: generating a complemented shifted chunk by performing a right shift operation to the multi-byte chunk; performing an OR operation using the multi-byte chunk and the complemented shifted chunk; generating an output sequence by performing an AND operation using a result of the OR operation and a comparison sequence; and counting a number of 1-bits in the output sequence and setting the number of 1-bits as the number of lead bytes for the multi-byte chunk.
3. The method of claim 2, wherein the comparison sequence is a repeated sequence of 0x40.
4. The method of claim 1, wherein processing the at least one code point in the series of code points comprises: identifying a byte based on a location of the pointer; loading the byte into the at least one register; in response to determining that the byte is a lead byte, determining a number of trailing bytes following the byte; and moving the pointer based on the number of trailing bytes.
5. The method of claim 1, further comprising determining whether the multi-byte chunk includes a NULL byte before determining the number of lead bytes in the multi-byte chunk.
6. The method of claim 5, wherein determining whether the multi-byte chunk includes a NULL byte comprises: generating a first shifted sequence by performing a right shift operation of one bit to the multi-byte chunk; generating a first resulting chunk by performing an OR operation using the multi-byte chunk and the first shifted sequence; designating the first resulting chunk as the multi-byte chunk; generating a second shifted chunk by performing a right shift operation of two bits to the multi-byte chunk; generating a second resulting chunk by performing an OR operation using the multi-byte chunk and the second shifted chunk; designating the second resulting chunk as the multi-byte chunk; generating a third shifted chunk by performing a right shift operation of four bits to the multi-byte chunk; generating a third resulting chunk by performing an OR operation using the multi-byte chunk and the third shifted chunk; designating the third resulting chunk as the multi-byte chunk; performing an AND operation using the multi-byte chunk and a comparison sequence; determining that a NULL byte is included in the multi-byte chunk based on the AND operation; and providing an indication to the computing device indicating an error in the data file.
7. The method of claim 1, wherein the variable-width encoding scheme is UTF-8 and the at least one register comprises at least one of a 32-bit register, a 64-bit register, or a 128-bit register.
8. The method of claim 1, further comprising: determining that a memory address of the reference code point is not aligned; responsive to determining that the memory address of the reference code point is not aligned, identifying a number of bytes in the data file until a next chunk of the data file is memory aligned; loading an initial chunk of memory including one or more padded bytes and the number of bytes to a register; and moving the pointer to a location in memory that corresponds to the next chunk.
9. A non-transitory computer-readable medium storing instructions that are executable by at least one processing device to perform operations comprising: accessing a data file stored in memory of a computing device, the data file including a series of characters corresponding to a series of code points, each code point in the series of code points being encoded in memory using a variable-width encoding scheme, and each code point in the series of code points corresponding to one or more encoded bytes in memory; receiving a request to skip ahead a desired number of code points in the data file from a reference code point in the series of code points; initializing a counter indicating the desired number of code points to be skipped; initializing a pointer, the pointer pointing to a location in the memory corresponding to the reference code point; performing, for each iteration in one or more iterations, until the counter satisfies a threshold value: loading a multi-byte chunk of data in the memory, starting from the location of the pointer, into at least one register; determining a number of lead bytes in the multi-byte chunk by performing one or more operations on the multi-byte chunk; and decrementing the counter using the number of lead bytes and advancing the pointer to a different location in the memory based on a number of bytes in the multi-byte chunk; responsive to the counter satisfying the threshold value, individually processing at least one code point in the series of code points to advance the pointer until the counter satisfies a minimum value; and responsive to the counter satisfying the minimum value, outputting data corresponding to a destination code point in the series of code points, the destination code point being identified by a current position of the pointer.
10. The non-transitory computer-readable medium of claim 9, wherein determining the number of lead bytes in the multi-byte chunk comprises: generating a complemented shifted chunk by performing a right shift operation to the multi-byte chunk; performing an OR operation using the multi-byte chunk and the complemented shifted chunk; generating an output sequence by performing an AND operation using a result of the OR operation and a comparison sequence; and counting a number of 1-bits in the output sequence and setting the number of 1-bits as the number of lead bytes for the multi-byte chunk.
11. The non-transitory computer-readable medium of claim 10, wherein the comparison sequence is a repeated sequence of 0x40.
12. The non-transitory computer-readable medium of claim 9, wherein processing the at least one code point in the series of code points comprises: identifying a byte based on a location of the pointer; loading the byte into the at least one register; in response to determining that the byte is a lead byte, determining a number of trailing bytes following the byte; and moving the pointer based on the number of trailing bytes.
13. The non-transitory computer-readable medium of claim 9, the operations further comprising determining whether the multi-byte chunk includes a NULL byte before determining the number of lead bytes in the multi-byte chunk.
14. The non-transitory computer-readable medium of claim 13, wherein determining whether the multi-byte chunk includes a NULL byte comprises: generating a first shifted sequence by performing a right shift operation of one bit to the multi-byte chunk; generating a first resulting chunk by performing an OR operation using the multi-byte chunk and the first shifted sequence; designating the first resulting chunk as the multi-byte chunk; generating a second shifted chunk by performing a right shift operation of two bits to the multi-byte chunk; generating a second resulting chunk by performing an OR operation using the multi-byte chunk and the second shifted chunk; designating the second resulting chunk as the multi-byte chunk; generating a third shifted chunk by performing a right shift operation of four bits to the multi-byte chunk; generating a third resulting chunk by performing an OR operation using the multi-byte chunk and the third shifted chunk; designating the third resulting chunk as the multi-byte chunk; performing an AND operation using the multi-byte chunk and a comparison sequence; determining that a NULL byte is included in the multi-byte chunk based on the AND operation; and providing an indication to the computing device indicating an error in the data file.
15. The non-transitory computer-readable medium of claim 9, wherein the variable-width encoding scheme is UTF-8 and the at least one register comprises at least one of a 32-bit register, a 64-bit register, or a 128-bit register.
16. The non-transitory computer-readable medium of claim 9, the operations further comprising: determining that a memory address of the reference code point is not aligned; responsive to determining that the memory address of the reference code point is not aligned, identifying a number of bytes in the data file until a next chunk of the data file is memory aligned; loading an initial chunk of memory including one or more padded bytes and the number of bytes to a register; and moving the pointer to a location in memory that corresponds to the next chunk.
17. A system comprising: one or more processors; and a computer-readable storage medium storing instructions that are executable by the one or more processors to perform operations comprising: accessing a data file stored in memory of a computing device, the data file including a series of characters corresponding to a series of code points, each code point in the series of code points being encoded in memory using a variable-width encoding scheme, and each code point in the series of code points corresponding to one or more encoded bytes in memory; receiving a request to skip ahead a desired number of code points in the data file from a reference code point in the series of code points; initializing a counter indicating the desired number of code points to be skipped; initializing a pointer, the pointer pointing to a location in the memory corresponding to the reference code point; performing, for each iteration in one or more iterations, until the counter satisfies a threshold value: loading a multi-byte chunk of data in the memory, starting from the location of the pointer, into at least one register; determining a number of lead bytes in the multi-byte chunk by performing one or more operations on the multi-byte chunk; and decrementing the counter using the number of lead bytes and advancing the pointer to a different location in the memory based on a number of bytes in the multi-byte chunk; responsive to the counter satisfying the threshold value, individually processing at least one code point in the series of code points to advance the pointer until the counter satisfies a minimum value; and responsive to the counter satisfying the minimum value, outputting data corresponding to a destination code point in the series of code points, the destination code point being identified by a current position of the pointer.
18. The system of claim 17, wherein determining the number of lead bytes in the multi-byte chunk comprises: generating a complemented shifted chunk by performing a right shift operation to the multi-byte chunk; performing an OR operation using the multi-byte chunk and the complemented shifted chunk; generating an output sequence by performing an AND operation using a result of the OR operation and a comparison sequence; and counting a number of 1-bits in the output sequence and setting the number of 1-bits as the number of lead bytes for the multi-byte chunk.
19. The system of claim 17, wherein processing the at least one code point in the series of code points comprises: identifying a byte based on a location of the pointer; loading the byte into the at least one register; in response to determining that the byte is a lead byte, determining a number of trailing bytes following the byte; and moving the pointer based on the number of trailing bytes.
20. The system of claim 17, the operations further comprising determining whether the multi-byte chunk includes a NULL byte before determining the number of lead bytes in the multi-byte chunk by: generating a first shifted sequence by performing a right shift operation of one bit to the multi-byte chunk; generating a first resulting chunk by performing an OR operation using the multi-byte chunk and the first shifted sequence; designating the first resulting chunk as the multi-byte chunk; generating a second shifted chunk by performing a right shift operation of two bits to the multi-byte chunk; generating a second resulting chunk by performing an OR operation using the multi-byte chunk and the second shifted chunk; designating the second resulting chunk as the multi-byte chunk; generating a third shifted chunk by performing a right shift operation of four bits to the multi-byte chunk; generating a third resulting chunk by performing an OR operation using the multi-byte chunk and the third shifted chunk; designating the third resulting chunk as the multi-byte chunk; performing an AND operation using the multi-byte chunk and a comparison sequence; determining that a NULL byte is included in the multi-byte chunk based on the AND operation; and providing an indication to the computing device indicating an error in the data file.
Unknown
July 1, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.