Yesterday I happened to read a post by a friend where he had mentioned a problem in a certain program. He had mentioned that his code worked correctly only up till a certain number of digits(it was 5). The compiler he was on was “Turbo C”, which is a very old compiler (to be precise was introduced in 1987) and should not be used by anyone right now. When I ran the same code on my laptop, on “Microsoft Visual C++ Express 2010(MVCE)”, I noticed that the code worked perfectly for numbers containing 11 digits. So, then I began to wonder what exactly is causing this problem. After searching for quite a while , and reading up a lot of stuff about how memory allocation is done in C, how information is stored in general on the computer memory, I was able to understand why this was happening.
I knew that the problem he was facing was because of the declaration of his variable of the type “int”, which like all other data types has a certain range of permissible values. But I was not able to figure out why it was working for a larger range of values in MVCE !
This search led to me knowing the meaning of processor architecture, and how it affects the various applications. The basics of memory allocation lie in the meaning of processor architecture.
In this post , I will try to explain how memory is allotted to different data not just in C, but in general. I will explain what exactly is the meaning of processor architecture.I will not go into too much detail like how a computer stores/fetches memory on/from the different storage devices like caches, RAMs, ROMs , hard disks, etc. For that, I’ll include some links which you can look into, if you are interested. After understanding this, its evident why this problem was occurring.
16,32,64-Bit Processors and Memory Allocation:
All information is stored on the computer in the form of 0’s and 1’s. This, as we know is the binary number system and a digit in this system is called Binary Digit . Thus information is stored in the form of bits. So all letters, symbols and numbers that we type are first converted into a pattern of bits and are then stored in memory as a computer can understand only bits.
So if we enter a number say 18(base 10) , it will be converted to base 2, which would be :
18 = 16 + 2
18 = 1*(2^4) + 1*(2^1)
18 = 1*(2^4) + 0*(2^3) + 0*(2^2) + 1*(2^1) + 0*(2^0)
Thus 18 = 10010
Hence 18, will be stored as 10010. Similarly, the letters and symbols on the keyboard are assigned certain numerical values which are based on the ASCII(American Standard Code for Information Exchange) or UTF schemes. In ASCII, A is assigned the value 65. Thus when “A” is typed on the keyboard, the computer converts 65 to binary and stores its value in memory. All symbols have such values(including space bar).
So how is this binary value stored?
For this we need to understand, what is called a “word size” of a computer. The word size is the computer’s preferred size for moving units of information around; technically it’s the width of your processor’s “registers”, which are the holding areas your processor uses to do arithmetic and logical calculations. Basically “word size” means the size of the maximum binary value the computer is able to store in a single register(register can be visualized as a box where each character/letter/number is stored in its binary form) .
This word size depends on your processor. This is what is meant by 16-bit/32-bit / 64-bit processors. The bit part refers to the width of the processors register, how many binary digits it can store . Hence, a 16-bit processor can store values ranging from 00000..(16 0’s) to 11111…(16 1’s). Thus the number of combinations of 0’s and 1’s in this arrangement would be (2^16). This is called the “Dynamic Range” of a system. Thus the highest integer value that can be stored in a register would be 1111…(16 times 1). In base 10 this number would be
1*(2^15) +1*(2^14) + ……+ 1*(2^1) + 1*(2^0)
= 1 + 2 + 4 + 8 + 16….. + 2^15
This sum calculated by using the GP sum formula gives (2^16) – 1 = 65,535
Hence an int data type would store the maximum number to the base 10 as 65,535. Integer values are stored in “GPR’s – General Purpose Registers”. The floating point limit is much higher as they are stored in “FPR’s – Floating Point Registers”. These work on a different principle , which at this point I haven’t looked into.
Thus now we know what the max size an integer value can take on a certain processor. We discussed that registers store this value . Registers are a storage unit, they cannot perform any operations on the numbers. For operations, these values are passed to the ALU(Arithmetic Logic Unit). The register is the fastest accessable memory storage for an ALU. But registers store values only temporarily as their size is limited due to their high cost. Whenever the value is not needed, it is passed on to the RAM. Data is stored on the ram on “addresses” . Addresses are basically numbers. An example would be: Address #1, address #2,etc (In actuality only bits(#0, #1, #10, #11,etc.) will be allowed not #1, #2..). Thus the number of addresses possible on the ram is the number of integer values possible. Thus on a 32-Bit processor , number of addresses would be 2^32 = 4.3 billion . Each address stores 1 byte of data, thus this means 4.3 Billions bytes = 4 GigaBytes . Hence a 32-bit processor supports a maximum RAM of 4GB. What this means is it can store 4.3 billion addresses , each address might contain numbers(data) or operations(+, -, *, /)(code).
This is what defines a computer at the basic level : Manipulation of data through the use of code, which is stored in the computer’s memory and fetched from the computer’s memory.
The above statement defines all tasks that a computer does!
Now coming back to the main problem:
Not all application can take advantage of the higher bit processors. Each application is coded for a certain type of processor. The Windows OS is available in 32-bit and 64-bit editions. The latter is only useful for 64-bit processor. A 32-bit OS would treat a 64-bit processor as a 32-bit one , and would not take the extra advantage of the extra “dynamic range” that we saw.
Now, Turbo C is an application that was built for the 16-bit DOS operating system. Thus it is a 16-bit application. This means that its dynamic range is limited to 2^16 values. Hence the max value of an integer would be 1111…(16 times 1), which in base 10 will be 65,535 as we saw before. Hence the 5 digit limit on integers .
MVCE, on the other hand is a 32-bit application . Thus while it wont use a 64-bit processor’s full capability, it will give us the “dynamic range” of 2^32 numbers. Max value of integer would be 1111…(32 times 1), which in base 10 would be 2^32 – 1 = 4.3 * 10^8 . Thus the extra number of digits possible in MVCE.
I have only mentioned some basic stuff here, this topic is vast and quite interesting. So here I will quote my references which are really good and informative reads:
Classic.Ars: Understanding the Microprocessor : http://archive.arstechnica.com/paedia/c/cpu/part-1/cpu1-1.html
Classic.Ars: An Introduction to 64-bit Computing and x86-64: http://arstechnica.com/features/2008/09/x86-64/
Bits and bytes , how they work: http://computer.howstuffworks.com/bytes.htm
How does my computer store things in memory? : http://en.tldp.org/HOWTO/Unix-and-Internet-Fundamentals-HOWTO/core-formats.html
How computer memory works : http://www.howstuffworks.com/computer-memory.htm
Clarification : The int values I have given will be possible only if we declare an “unsigned int”, as for an int variable , 1bit is reserved for the number sign. Hence max value of a normal int would be 2^15 – 1 = 32,767 on 16-bit compiler and 2^31 – 1 for a 32-bit compiler.