AnimeSuki Forums

Register Forum Rules FAQ Community Today's Posts Search

Go Back   AnimeSuki Forum > Support > Tech Support

Notices

Reply
 
Thread Tools
Old 2006-07-20, 14:38   Link #1
RavenChild
エリック
 
 
Join Date: Nov 2003
Location: my closet, deep and dark
Age: 36
Send a message via AIM to RavenChild Send a message via MSN to RavenChild
Determine Length of a Variable

I've got a quick programming question. The language is c.

I need to find the length of a string variable declared as such:
Code:
char *STR;
This is not a normal string in that it contains Null characters (0x00) and therefore cannot be determined using strlen() in string.h.

Here's an example of the string in hex:
Code:
0x10
0x91
0x20
0xFC
0x00
0x10
0x00
0x00
The length should return (int) 8.

Until now I have just kept track of the length by counter but that isn't going to cut it now. The string itself should always be Null-Terminated but I cannot guarantee this.

Thanks for any replies in advance.
__________________
************* RavenChild **************
RavenChild is offline   Reply With Quote
Old 2006-07-20, 17:24   Link #2
GHDpro
Administrator
*Administrator
 
 
Join Date: Jan 2001
Location: Netherlands
Age: 45
Well the problem seems to be that the variable itself is variable-length (a pointer).
Without a character indicating the end of the string, there probably isn't a way of
checking how long the string is. The only thing you might be able to do (but I'm not
that familiar with C) is check the size of the buffer allocated to the pointer.

Anyway... without a character clearly indicating the end of the string I think using a
length counter variable is inevitable.
GHDpro is offline   Reply With Quote
Old 2006-07-20, 18:42   Link #3
Jinto
Asuki-tan Kairin ↓
 
 
Join Date: Feb 2004
Location: Fürth (GER)
Age: 43
Could be a problem, since ANSI C defines strings as null terminated, hence all functions use it as a "must belief" when determining the string length.

You could possibly get the length of the pointer address-space assignment. Well I don't know how to achieve this with mere ANSI C.

Since pointers are a mere address (without explicit definition of length), it is up to the runtime environment , how it handles this address management. Most environments will assign the address in a memory block in the data stack, which is exactly the size of the desired reference or bigger. That usually means, two pointers that are assigned/defined directly one after another might use the same memory block (i.e. if you assign char *temp = (char[]) malloc(8); and *temp2 = (char[]) malloc(8); and you access temp[8] = 'x' ... chances are high that *temp2[0] will be 'x' ... there will be no exception thrown or something alike since the address space of temp2 is valid memory too... its just wrong in the context, but C++ doesn't know. The programmer however should avoid such inconsistencies in the coding phase).

So one cannot say how long the assigned memory of a pointer actually was, since C++ doesn't care, and the environment just assigns the start address in a big enough memory block. The environment (typically the routines of the operating system) keeps track of these assignments, it has to do, because it needs to know where the imaginary bounds of a pointer are... so it can securely assign other pointer adresses. Since this is up to the environment, there might be some evironments where you can access these informations (I don't know about such functionality in Windows/Linux but maybe it exists).

Another way is the use of a so called fat pointer. Fat pointers contain the address and the boundaries. Which is a concept for a safer pointer and not a really implemented pointer type in C++ (at least I do not know of such an implementation... maybe there exist some compiler specific fat pointer implementations)
__________________
Folding@Home, Team Animesuki
Jinto is offline   Reply With Quote
Old 2006-07-20, 19:39   Link #4
RavenChild
エリック
 
 
Join Date: Nov 2003
Location: my closet, deep and dark
Age: 36
Send a message via AIM to RavenChild Send a message via MSN to RavenChild
The data That I need to read through should not be greater than 512 bytes. I suppose that I could just store an entire char[512] and clip off the 0x00's on the end. I'm making a asm compiler for a 8bit chipset. I have everything made except for writing the final binary to a file. Now the problem is rewritting the program to use this. I just got out of surgery a couple of days ago and the meds really help for programming haha. Thanks again.

<edit>

I decided to use a counter and it actually turned out to be more efficient than any other way could have been. Because I have been working with harder problems than this, I overlooked the simplistic answer for this. I guess every once in awhile, programmers should go back to "Hello World!" to remind themselves that hard problems don't require harder answers.

</edit>
__________________
************* RavenChild **************

Last edited by RavenChild; 2006-07-20 at 20:08.
RavenChild is offline   Reply With Quote
Old 2006-07-20, 22:23   Link #5
Cz
Needs more sleep~
 
 
Join Date: Jun 2003
Location: #animesuki
Looks like you already solved it, but I was going to say that you either have to keep track of the length of the "string" yourself (like with a counter) or by using some sentinel value to terminate the string (some unique sequence of characters that - hopefully - should not exist in the string itself). For example, 0xDEADBEEF. Unfortunately that isn't foolproof since the chances of that byte string occuring is not 0 (i.e. it has a chance of appearing in your string).

Jinto's way, like he mentioned, is not reliable across architectures and environments or even compiler versions unless your problem has enough restrictions that that will be true.

Or use a language that supports "better" arrays (where length is tracked). (Yeah I know, not really an option in your case)
Cz is offline   Reply With Quote
Old 2006-07-21, 03:37   Link #6
Jinto
Asuki-tan Kairin ↓
 
 
Join Date: Feb 2004
Location: Fürth (GER)
Age: 43
Quote:
Originally Posted by lwl12
Looks like you already solved it, but I was going to say that you either have to keep track of the length of the "string" yourself (like with a counter) or by using some sentinel value to terminate the string (some unique sequence of characters that - hopefully - should not exist in the string itself). For example, 0xDEADBEEF. Unfortunately that isn't foolproof since the chances of that byte string occuring is not 0 (i.e. it has a chance of appearing in your string).
I'ld say a string termination sequence is possible. RavenChild writes an ASM compiler. From my lectures about compilers and interpreters I remember that the language should be a subset of universal symbol sequences, which can be derived from a Chomsky type 1 grammar. Since that restricts the semantics of a language, there will be an infinitely large amount of symbol sequences that won't be part of the language thus qualifying for a termination sequence.

But if I knew the max. size is only 512 bytes, I'ld assign its size as MAXSIZE and I'ld go step by step through the array and copy every byte thats not NULL into the final array terminating it with NULL. But I need to assure the assigned array pointer points to blank (NULLed) memory, which is not done by normal assigning routines afaik (malloc i.e. doesn't care if there is any data in the assigned memory). So I'ld have to assign each array element to NULL before using it with a function. Its still the simplest thing one could do (and simple is better in programming if performance doesn't count)
__________________
Folding@Home, Team Animesuki
Jinto is offline   Reply With Quote
Old 2006-07-21, 12:34   Link #7
RavenChild
エリック
 
 
Join Date: Nov 2003
Location: my closet, deep and dark
Age: 36
Send a message via AIM to RavenChild Send a message via MSN to RavenChild
Haha, i've not had any programming classes or lectures on compilers but that seems like a good enough explanation. I guess the size really wouldn't matter if it's only 512bytes that I was storing. I also built an interpreter for the architecture. I started on the project last year but ran into a problem similar to this. The medication made me want to finish it, so i did.

Ohh, I'm starting college in the fall with a comp. engineering major. I'll be learning stuff like this in a year or so.

Thanks again.
__________________
************* RavenChild **************
RavenChild is offline   Reply With Quote
Old 2006-07-21, 22:59   Link #8
Cz
Needs more sleep~
 
 
Join Date: Jun 2003
Location: #animesuki
Quote:
Originally Posted by Jinto Lin
But if I knew the max. size is only 512 bytes, I'ld assign its size as MAXSIZE and I'ld go step by step through the array and copy every byte thats not NULL into the final array terminating it with NULL. But I need to assure the assigned array pointer points to blank (NULLed) memory, which is not done by normal assigning routines afaik (malloc i.e. doesn't care if there is any data in the assigned memory). So I'ld have to assign each array element to NULL before using it with a function. Its still the simplest thing one could do (and simple is better in programming if performance doesn't count)
You lost me there. Why are you terminating the string with NULL when NULLs are possible within the string itself? I only see a counter and a sentinel value as the two options possible if we're talking about C strings. You can't really pad a string shorter than 512 bytes with NULLs because how would you know that those NULLs are not part of the string? If you know that, say, 5 NULLs in a row is impossible, then maybe you could use that as a terminator, but then this falls into the "sentinel value" solution.

Maybe you could clarify your way since I don't think I understood it. ^^;
Cz is offline   Reply With Quote
Old 2006-07-22, 04:08   Link #9
Jinto
Asuki-tan Kairin ↓
 
 
Join Date: Feb 2004
Location: Fürth (GER)
Age: 43
It would not work directly but with a little modification. I told you about the problem with the length of assigned pointers (one cannot really determine its length and unless one keeps track of the length in a variable and passes it to successor functions, other functions won't have the abilty to determine the length either). Without the length variable there are 2 options either one uses a termination sequence or NULL. If one uses NULL as termination sequence the NULLs in the string need to be replaced by other symbols (only possible if the symbol set of the compiler does not use all 256 symbols).
But I don't see the point of NULL sequences in something like mashine code. I could imagine those NULLs mean either a specific operation (like i.e. NOP) or do not have any meaning at all. If they have no meaning, simply skipping these NULLs when copying them to a string and terminating with a NULL seems to be the next best thing to do.
__________________
Folding@Home, Team Animesuki
Jinto is offline   Reply With Quote
Old 2006-07-22, 16:04   Link #10
Cz
Needs more sleep~
 
 
Join Date: Jun 2003
Location: #animesuki
Ah ok, I get it now. While reading your post, I just thought of something. We could store the length of the string in thw first two bytes of the string, so using the code in the first post,
Code:
int length = STR[0] + (STR[1] << 8);
assuming the first byte contains the least-significent byte of the 2-byte integer length and the 2nd byte has the most-significant byte. We need two bytes since we need up to 10 bits to store 512 in length.

On the other hand, unless you have other limitations, a struct would be much clearer:
Code:
typedef struct {
   short int length;
   char *STR;
} stAsmInfo, *pstAsmInfo;
Then one could pass a struct pointer instead of a C string into each function.
Cz is offline   Reply With Quote
Old 2006-07-24, 22:15   Link #11
RavenChild
エリック
 
 
Join Date: Nov 2003
Location: my closet, deep and dark
Age: 36
Send a message via AIM to RavenChild Send a message via MSN to RavenChild
Haha, I never thought that it would cause this much of a conversation.

The nulls on the end are just a conincidence. I don't have to have it null terminated, the example string just had some on the end.
lwl12 has a really good solution too. but you would still have to count the length to put into the first two bytes.

To not incure any real performance bottlenecks with the compiler, I just used a bit of inline ASM to increase the counter. But it you think about it, compiling a file of asm that is around 380 lines takes well less than a second. I really don't think developers care about compile time unless you get into the hundreds of thousands/millions of lines of code.
__________________
************* RavenChild **************
RavenChild is offline   Reply With Quote
Old 2006-07-24, 22:39   Link #12
Cz
Needs more sleep~
 
 
Join Date: Jun 2003
Location: #animesuki
Quote:
Originally Posted by RavenChild
lwl12 has a really good solution too. but you would still have to count the length to put into the first two bytes.
This assumes that you know the length so you can place it there.

Quote:
I really don't think developers care about compile time unless you get into the hundreds of thousands/millions of lines of code.
Like Windows code? On a cluster of lab machines it takes a few hours to compile.
Cz is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 04:07.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
We use Silk.