The following two code samples do the same thing, but is one of them more efficient than the other?

```
void foo(char *p)
{
while (*p != 0)
{
*p = 0;
++p;
}
}
```

```
void foo(char *p)
{
char *pBeginning = p;
while (*p != 0) { ++p; }
char *pEnd = p;
p = pBeginning;
while (p < pEnd)
{
*p = 0;
++p;
}
}
```

I am thinking that the second sample may be more efficient, because the length of the second loop is known before it starts and the CPU can parallelize it. In the first sample parallelization is never possible because the length of the loop cannot be known in advance. Is this thinking correct?

Right, not only because of length known vs parallelization, but also it removes the requirement to read the destination array.

Knowing the length of the ray seams a requirement for vectorization of the first loop, but it is not necessarily a requirement.

loop:

read a vector worth of data

if any are zero exit loop

zero vector worth of data

end loop

zero lanes of vector until 1st lane with 0

Jim Dempsey

