12346563

Code Point Skipping with Variable-Width Encoding for Data Analytics System

PublishedJuly 1, 2025
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A method, comprising: accessing a data file stored in memory of a computing device, the data file including a series of characters corresponding to a series of code points, each code point in the series of code points being encoded in memory using a variable-width encoding scheme, and each code point in the series of code points corresponding to one or more encoded bytes in memory; receiving a request to skip ahead a desired number of code points in the data file from a reference code point in the series of code points; initializing a counter indicating the desired number of code points to be skipped; initializing a pointer, the pointer pointing to a location in the memory corresponding to the reference code point; performing, for each iteration in one or more iterations, until the counter satisfies a threshold value: loading a multi-byte chunk of data in the memory, starting from the location of the pointer, into at least one register; determining a number of lead bytes in the multi-byte chunk by performing one or more operations on the multi-byte chunk; and decrementing the counter using the number of lead bytes and advancing the pointer to a different location in the memory based on a number of bytes in the multi-byte chunk; responsive to the counter satisfying the threshold value, individually processing at least one code point in the series of code points to advance the pointer until the counter satisfies a minimum value; and responsive to the counter satisfying the minimum value, outputting data corresponding to a destination code point in the series of code points, the destination code point being identified by a current position of the pointer.

2

2. The method of claim 1, wherein determining the number of lead bytes in the multi-byte chunk comprises: generating a complemented shifted chunk by performing a right shift operation to the multi-byte chunk; performing an OR operation using the multi-byte chunk and the complemented shifted chunk; generating an output sequence by performing an AND operation using a result of the OR operation and a comparison sequence; and counting a number of 1-bits in the output sequence and setting the number of 1-bits as the number of lead bytes for the multi-byte chunk.

3

3. The method of claim 2, wherein the comparison sequence is a repeated sequence of 0x40.

4

4. The method of claim 1, wherein processing the at least one code point in the series of code points comprises: identifying a byte based on a location of the pointer; loading the byte into the at least one register; in response to determining that the byte is a lead byte, determining a number of trailing bytes following the byte; and moving the pointer based on the number of trailing bytes.

5

5. The method of claim 1, further comprising determining whether the multi-byte chunk includes a NULL byte before determining the number of lead bytes in the multi-byte chunk.

6

6. The method of claim 5, wherein determining whether the multi-byte chunk includes a NULL byte comprises: generating a first shifted sequence by performing a right shift operation of one bit to the multi-byte chunk; generating a first resulting chunk by performing an OR operation using the multi-byte chunk and the first shifted sequence; designating the first resulting chunk as the multi-byte chunk; generating a second shifted chunk by performing a right shift operation of two bits to the multi-byte chunk; generating a second resulting chunk by performing an OR operation using the multi-byte chunk and the second shifted chunk; designating the second resulting chunk as the multi-byte chunk; generating a third shifted chunk by performing a right shift operation of four bits to the multi-byte chunk; generating a third resulting chunk by performing an OR operation using the multi-byte chunk and the third shifted chunk; designating the third resulting chunk as the multi-byte chunk; performing an AND operation using the multi-byte chunk and a comparison sequence; determining that a NULL byte is included in the multi-byte chunk based on the AND operation; and providing an indication to the computing device indicating an error in the data file.

7

7. The method of claim 1, wherein the variable-width encoding scheme is UTF-8 and the at least one register comprises at least one of a 32-bit register, a 64-bit register, or a 128-bit register.

8

8. The method of claim 1, further comprising: determining that a memory address of the reference code point is not aligned; responsive to determining that the memory address of the reference code point is not aligned, identifying a number of bytes in the data file until a next chunk of the data file is memory aligned; loading an initial chunk of memory including one or more padded bytes and the number of bytes to a register; and moving the pointer to a location in memory that corresponds to the next chunk.

9

9. A non-transitory computer-readable medium storing instructions that are executable by at least one processing device to perform operations comprising: accessing a data file stored in memory of a computing device, the data file including a series of characters corresponding to a series of code points, each code point in the series of code points being encoded in memory using a variable-width encoding scheme, and each code point in the series of code points corresponding to one or more encoded bytes in memory; receiving a request to skip ahead a desired number of code points in the data file from a reference code point in the series of code points; initializing a counter indicating the desired number of code points to be skipped; initializing a pointer, the pointer pointing to a location in the memory corresponding to the reference code point; performing, for each iteration in one or more iterations, until the counter satisfies a threshold value: loading a multi-byte chunk of data in the memory, starting from the location of the pointer, into at least one register; determining a number of lead bytes in the multi-byte chunk by performing one or more operations on the multi-byte chunk; and decrementing the counter using the number of lead bytes and advancing the pointer to a different location in the memory based on a number of bytes in the multi-byte chunk; responsive to the counter satisfying the threshold value, individually processing at least one code point in the series of code points to advance the pointer until the counter satisfies a minimum value; and responsive to the counter satisfying the minimum value, outputting data corresponding to a destination code point in the series of code points, the destination code point being identified by a current position of the pointer.

10

10. The non-transitory computer-readable medium of claim 9, wherein determining the number of lead bytes in the multi-byte chunk comprises: generating a complemented shifted chunk by performing a right shift operation to the multi-byte chunk; performing an OR operation using the multi-byte chunk and the complemented shifted chunk; generating an output sequence by performing an AND operation using a result of the OR operation and a comparison sequence; and counting a number of 1-bits in the output sequence and setting the number of 1-bits as the number of lead bytes for the multi-byte chunk.

11

11. The non-transitory computer-readable medium of claim 10, wherein the comparison sequence is a repeated sequence of 0x40.

12

12. The non-transitory computer-readable medium of claim 9, wherein processing the at least one code point in the series of code points comprises: identifying a byte based on a location of the pointer; loading the byte into the at least one register; in response to determining that the byte is a lead byte, determining a number of trailing bytes following the byte; and moving the pointer based on the number of trailing bytes.

13

13. The non-transitory computer-readable medium of claim 9, the operations further comprising determining whether the multi-byte chunk includes a NULL byte before determining the number of lead bytes in the multi-byte chunk.

14

14. The non-transitory computer-readable medium of claim 13, wherein determining whether the multi-byte chunk includes a NULL byte comprises: generating a first shifted sequence by performing a right shift operation of one bit to the multi-byte chunk; generating a first resulting chunk by performing an OR operation using the multi-byte chunk and the first shifted sequence; designating the first resulting chunk as the multi-byte chunk; generating a second shifted chunk by performing a right shift operation of two bits to the multi-byte chunk; generating a second resulting chunk by performing an OR operation using the multi-byte chunk and the second shifted chunk; designating the second resulting chunk as the multi-byte chunk; generating a third shifted chunk by performing a right shift operation of four bits to the multi-byte chunk; generating a third resulting chunk by performing an OR operation using the multi-byte chunk and the third shifted chunk; designating the third resulting chunk as the multi-byte chunk; performing an AND operation using the multi-byte chunk and a comparison sequence; determining that a NULL byte is included in the multi-byte chunk based on the AND operation; and providing an indication to the computing device indicating an error in the data file.

15

15. The non-transitory computer-readable medium of claim 9, wherein the variable-width encoding scheme is UTF-8 and the at least one register comprises at least one of a 32-bit register, a 64-bit register, or a 128-bit register.

16

16. The non-transitory computer-readable medium of claim 9, the operations further comprising: determining that a memory address of the reference code point is not aligned; responsive to determining that the memory address of the reference code point is not aligned, identifying a number of bytes in the data file until a next chunk of the data file is memory aligned; loading an initial chunk of memory including one or more padded bytes and the number of bytes to a register; and moving the pointer to a location in memory that corresponds to the next chunk.

17

17. A system comprising: one or more processors; and a computer-readable storage medium storing instructions that are executable by the one or more processors to perform operations comprising: accessing a data file stored in memory of a computing device, the data file including a series of characters corresponding to a series of code points, each code point in the series of code points being encoded in memory using a variable-width encoding scheme, and each code point in the series of code points corresponding to one or more encoded bytes in memory; receiving a request to skip ahead a desired number of code points in the data file from a reference code point in the series of code points; initializing a counter indicating the desired number of code points to be skipped; initializing a pointer, the pointer pointing to a location in the memory corresponding to the reference code point; performing, for each iteration in one or more iterations, until the counter satisfies a threshold value: loading a multi-byte chunk of data in the memory, starting from the location of the pointer, into at least one register; determining a number of lead bytes in the multi-byte chunk by performing one or more operations on the multi-byte chunk; and decrementing the counter using the number of lead bytes and advancing the pointer to a different location in the memory based on a number of bytes in the multi-byte chunk; responsive to the counter satisfying the threshold value, individually processing at least one code point in the series of code points to advance the pointer until the counter satisfies a minimum value; and responsive to the counter satisfying the minimum value, outputting data corresponding to a destination code point in the series of code points, the destination code point being identified by a current position of the pointer.

18

18. The system of claim 17, wherein determining the number of lead bytes in the multi-byte chunk comprises: generating a complemented shifted chunk by performing a right shift operation to the multi-byte chunk; performing an OR operation using the multi-byte chunk and the complemented shifted chunk; generating an output sequence by performing an AND operation using a result of the OR operation and a comparison sequence; and counting a number of 1-bits in the output sequence and setting the number of 1-bits as the number of lead bytes for the multi-byte chunk.

19

19. The system of claim 17, wherein processing the at least one code point in the series of code points comprises: identifying a byte based on a location of the pointer; loading the byte into the at least one register; in response to determining that the byte is a lead byte, determining a number of trailing bytes following the byte; and moving the pointer based on the number of trailing bytes.

20

20. The system of claim 17, the operations further comprising determining whether the multi-byte chunk includes a NULL byte before determining the number of lead bytes in the multi-byte chunk by: generating a first shifted sequence by performing a right shift operation of one bit to the multi-byte chunk; generating a first resulting chunk by performing an OR operation using the multi-byte chunk and the first shifted sequence; designating the first resulting chunk as the multi-byte chunk; generating a second shifted chunk by performing a right shift operation of two bits to the multi-byte chunk; generating a second resulting chunk by performing an OR operation using the multi-byte chunk and the second shifted chunk; designating the second resulting chunk as the multi-byte chunk; generating a third shifted chunk by performing a right shift operation of four bits to the multi-byte chunk; generating a third resulting chunk by performing an OR operation using the multi-byte chunk and the third shifted chunk; designating the third resulting chunk as the multi-byte chunk; performing an AND operation using the multi-byte chunk and a comparison sequence; determining that a NULL byte is included in the multi-byte chunk based on the AND operation; and providing an indication to the computing device indicating an error in the data file.

Patent Metadata

Filing Date

Unknown

Publication Date

July 1, 2025

Inventors

Christopher H. Kingsley

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “CODE POINT SKIPPING WITH VARIABLE-WIDTH ENCODING FOR DATA ANALYTICS SYSTEM” (12346563). https://patentable.app/patents/12346563

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.

CODE POINT SKIPPING WITH VARIABLE-WIDTH ENCODING FOR DATA ANALYTICS SYSTEM — Christopher H. Kingsley | Patentable