2006-07-20, 14:38 | Link #1 |
エリック
|
Determine Length of a Variable
I've got a quick programming question. The language is c.
I need to find the length of a string variable declared as such: Code:
char *STR; Here's an example of the string in hex: Code:
0x10 0x91 0x20 0xFC 0x00 0x10 0x00 0x00 Until now I have just kept track of the length by counter but that isn't going to cut it now. The string itself should always be Null-Terminated but I cannot guarantee this. Thanks for any replies in advance.
__________________
|
2006-07-20, 17:24 | Link #2 |
Administrator
Administrator
Join Date: Jan 2001
Location: Netherlands
Age: 45
|
Well the problem seems to be that the variable itself is variable-length (a pointer).
Without a character indicating the end of the string, there probably isn't a way of checking how long the string is. The only thing you might be able to do (but I'm not that familiar with C) is check the size of the buffer allocated to the pointer. Anyway... without a character clearly indicating the end of the string I think using a length counter variable is inevitable. |
2006-07-20, 18:42 | Link #3 |
Asuki-tan Kairin ↓
Join Date: Feb 2004
Location: Fürth (GER)
Age: 43
|
Could be a problem, since ANSI C defines strings as null terminated, hence all functions use it as a "must belief" when determining the string length.
You could possibly get the length of the pointer address-space assignment. Well I don't know how to achieve this with mere ANSI C. Since pointers are a mere address (without explicit definition of length), it is up to the runtime environment , how it handles this address management. Most environments will assign the address in a memory block in the data stack, which is exactly the size of the desired reference or bigger. That usually means, two pointers that are assigned/defined directly one after another might use the same memory block (i.e. if you assign char *temp = (char[]) malloc(8); and *temp2 = (char[]) malloc(8); and you access temp[8] = 'x' ... chances are high that *temp2[0] will be 'x' ... there will be no exception thrown or something alike since the address space of temp2 is valid memory too... its just wrong in the context, but C++ doesn't know. The programmer however should avoid such inconsistencies in the coding phase). So one cannot say how long the assigned memory of a pointer actually was, since C++ doesn't care, and the environment just assigns the start address in a big enough memory block. The environment (typically the routines of the operating system) keeps track of these assignments, it has to do, because it needs to know where the imaginary bounds of a pointer are... so it can securely assign other pointer adresses. Since this is up to the environment, there might be some evironments where you can access these informations (I don't know about such functionality in Windows/Linux but maybe it exists). Another way is the use of a so called fat pointer. Fat pointers contain the address and the boundaries. Which is a concept for a safer pointer and not a really implemented pointer type in C++ (at least I do not know of such an implementation... maybe there exist some compiler specific fat pointer implementations)
__________________
|
2006-07-20, 19:39 | Link #4 |
エリック
|
The data That I need to read through should not be greater than 512 bytes. I suppose that I could just store an entire char[512] and clip off the 0x00's on the end. I'm making a asm compiler for a 8bit chipset. I have everything made except for writing the final binary to a file. Now the problem is rewritting the program to use this. I just got out of surgery a couple of days ago and the meds really help for programming haha. Thanks again.
<edit> I decided to use a counter and it actually turned out to be more efficient than any other way could have been. Because I have been working with harder problems than this, I overlooked the simplistic answer for this. I guess every once in awhile, programmers should go back to "Hello World!" to remind themselves that hard problems don't require harder answers. </edit>
__________________
Last edited by RavenChild; 2006-07-20 at 20:08. |
2006-07-20, 22:23 | Link #5 |
Needs more sleep~
Join Date: Jun 2003
Location: #animesuki
|
Looks like you already solved it, but I was going to say that you either have to keep track of the length of the "string" yourself (like with a counter) or by using some sentinel value to terminate the string (some unique sequence of characters that - hopefully - should not exist in the string itself). For example, 0xDEADBEEF. Unfortunately that isn't foolproof since the chances of that byte string occuring is not 0 (i.e. it has a chance of appearing in your string).
Jinto's way, like he mentioned, is not reliable across architectures and environments or even compiler versions unless your problem has enough restrictions that that will be true. Or use a language that supports "better" arrays (where length is tracked). (Yeah I know, not really an option in your case) |
2006-07-21, 03:37 | Link #6 | |
Asuki-tan Kairin ↓
Join Date: Feb 2004
Location: Fürth (GER)
Age: 43
|
Quote:
But if I knew the max. size is only 512 bytes, I'ld assign its size as MAXSIZE and I'ld go step by step through the array and copy every byte thats not NULL into the final array terminating it with NULL. But I need to assure the assigned array pointer points to blank (NULLed) memory, which is not done by normal assigning routines afaik (malloc i.e. doesn't care if there is any data in the assigned memory). So I'ld have to assign each array element to NULL before using it with a function. Its still the simplest thing one could do (and simple is better in programming if performance doesn't count)
__________________
|
|
2006-07-21, 12:34 | Link #7 |
エリック
|
Haha, i've not had any programming classes or lectures on compilers but that seems like a good enough explanation. I guess the size really wouldn't matter if it's only 512bytes that I was storing. I also built an interpreter for the architecture. I started on the project last year but ran into a problem similar to this. The medication made me want to finish it, so i did.
Ohh, I'm starting college in the fall with a comp. engineering major. I'll be learning stuff like this in a year or so. Thanks again.
__________________
|
2006-07-21, 22:59 | Link #8 | |
Needs more sleep~
Join Date: Jun 2003
Location: #animesuki
|
Quote:
Maybe you could clarify your way since I don't think I understood it. ^^; |
|
2006-07-22, 04:08 | Link #9 |
Asuki-tan Kairin ↓
Join Date: Feb 2004
Location: Fürth (GER)
Age: 43
|
It would not work directly but with a little modification. I told you about the problem with the length of assigned pointers (one cannot really determine its length and unless one keeps track of the length in a variable and passes it to successor functions, other functions won't have the abilty to determine the length either). Without the length variable there are 2 options either one uses a termination sequence or NULL. If one uses NULL as termination sequence the NULLs in the string need to be replaced by other symbols (only possible if the symbol set of the compiler does not use all 256 symbols).
But I don't see the point of NULL sequences in something like mashine code. I could imagine those NULLs mean either a specific operation (like i.e. NOP) or do not have any meaning at all. If they have no meaning, simply skipping these NULLs when copying them to a string and terminating with a NULL seems to be the next best thing to do.
__________________
|
2006-07-22, 16:04 | Link #10 |
Needs more sleep~
Join Date: Jun 2003
Location: #animesuki
|
Ah ok, I get it now. While reading your post, I just thought of something. We could store the length of the string in thw first two bytes of the string, so using the code in the first post,
Code:
int length = STR[0] + (STR[1] << 8); On the other hand, unless you have other limitations, a struct would be much clearer: Code:
typedef struct { short int length; char *STR; } stAsmInfo, *pstAsmInfo; |
2006-07-24, 22:15 | Link #11 |
エリック
|
Haha, I never thought that it would cause this much of a conversation.
The nulls on the end are just a conincidence. I don't have to have it null terminated, the example string just had some on the end. lwl12 has a really good solution too. but you would still have to count the length to put into the first two bytes. To not incure any real performance bottlenecks with the compiler, I just used a bit of inline ASM to increase the counter. But it you think about it, compiling a file of asm that is around 380 lines takes well less than a second. I really don't think developers care about compile time unless you get into the hundreds of thousands/millions of lines of code.
__________________
|
2006-07-24, 22:39 | Link #12 | ||
Needs more sleep~
Join Date: Jun 2003
Location: #animesuki
|
Quote:
Quote:
|
||
|
|