Patentable/Patents/US-11593625
US-11593625

Method and apparatus with neural network parameter quantization

PublishedFebruary 28, 2023
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

Provided is a processor implemented method that includes performing training or an inference operation with a neural network by obtaining a parameter for the neural network in a floating-point format, applying a fractional length of a fixed-point format to the parameter in the floating-point format, performing an operation with an integer arithmetic logic unit (ALU) to determine whether to round off a fixed point based on a most significant bit among bit values to be discarded after a quantization process, and performing an operation of quantizing the parameter in the floating-point format to a parameter in the fixed-point format, based on a result of the operation with the ALU.

Patent Claims
3 claims

Legal claims defining the scope of protection, as filed with the USPTO.

10

10. A non-transitory computer-readable recording medium having recorded thereon a computer program, which, when executed by a computer, performs the method of claim 1.

18

18. The neural network apparatus of claim 15, wherein when the floating-point format is a single-precision floating-point format, the bias constant is a decimal number of 127, the number of bits of the first mantissa value is a decimal number of 23, and the predetermined number is a decimal number of 22, and when the floating-point format is a double-precision floating-point format, the bias constant is a decimal number of 1023, the number of bits of the first mantissa value is a decimal number of 52, and the predetermined number is a decimal number of 51.

20

20. The neural network apparatus of claim 11, further comprising a memory storing instruction, which when executed by the processors, configure the processor to perform the obtaining of the parameter, the applying of the fractional length to the floating-point format, the determining, and the quantizing of the parameter.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

October 15, 2018

Publication Date

February 28, 2023

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Method and apparatus with neural network parameter quantization” (US-11593625). https://patentable.app/patents/US-11593625

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.