## Reading numpy structured from a text file

Numpy has a very nice feature: a structured array, that is an array in which rows have some structure and can store different types of data in each column.

For example:

```>>> import numpy as np
>>> arr = np.zeros(10, dtype=[['id', np.uint16], ['position', np.dtype('3float32')], ['momentum', np.dype('3float32')]])
```

We have defined a structured array in each row we store: id of a particle (unsigned int), its position (three floats) and momentum (again three floats).

You can easily select from this array:

```>>> arr['position'] # positions of all particles
>>> arr[0]['position'] # position of first particle
>>> arr[arr['id']=1]['position']  # positions of all particles with id equal to 1
```

This is a nice format because:

• Your data has structure. No more off-by-one errors: particle position is labeled.
• Very easy to load from binary files

Loading from text files is an entirely different matter --- because writing to such arrays is kind of pain.

My requirements were:

• Array structure is the same as source file structure (order of fields is the same)
• Array structure is defined only in a single place: that is the dtype definition

## Solution

The solution is to:

• Read file line by line parsing contents to an unstructured array.
• Create a structured view
• Should be fast, that means no copying of large arrays.

Actual dtype used:

```URQMD_DATA_DTYPE = [
("time", np.float32),
("position", np.dtype("3float32")),
("energy", np.float32),
("momentum", np.dtype("3float32")),
("mass", np.float32),
("particle_type", np.float32),
]
```

Helper function that takes structured dtype, and turns it to dtype that has the same number of fields but is unstructured:

```def serialize_dtype(dt):
dt = np.dtype(dt)
newdt = []
for item in dt.descr:
if len(item) == 2:
count = 1,
name, type = item
else:
name, type, count = item
if len(count) > 1:
raise ValueError()
count = count[0]
for ii in range(count):
newdt.append(type)
return np.dtype(", ".join(newdt))
```

Now frame is a list of lines from text file.

```parsed = np.zeros(len(frame), dtype=serialize_dtype(URQMD_DATA_DTYPE)) # Create array without structure
for ii, line in enumerate(frame):
data = [float(x) for x in line.split()] # Parse lines
#-- ignoring wheher it is a float or int
parsed[ii] = tuple(data) # Now numpy will convert single row to proper types
parsed = parsed.view(URQMD_DATA_DTYPE) # Create a structured view (no copy!)
```

Sound simple but took me some time to get it right.