Skip to content Skip to sidebar Skip to footer

Loading My Data In Numpy Genfromtxt Get Errors

I have my data file contain 7500 lines with : Y1C 1.53 -0.06 0.58 0.52 0.42 0.16 0.79 -0.6 -0.3 -0.78 -0.14 0.38 0.34 0.23 0.26 -1.8

Solution 1:

A short sample bytestring substitute for a file:

In [168]: txt = b"""Y7C\t-0.22\t-0.12\t-0.29\t-0.51\t-0.81 
     ...: Y7C\t-0.22\t-0.12\t-0.29\t-0.51\t-0.81 
     ...: Y7C\t-0.22\t-0.12\t-0.29\t-0.51\t-0.81  
     ...: """

Minimal load with correct delimiter. Note the first column is nan, because it can't convert the strings to float.

In [169]: np.genfromtxt(txt.splitlines(),delimiter='\t')
Out[169]: 
array([[  nan, -0.22, -0.12, -0.29, -0.51, -0.81],
       [  nan, -0.22, -0.12, -0.29, -0.51, -0.81],
       [  nan, -0.22, -0.12, -0.29, -0.51, -0.81]])

with dtype=None it sets each column dtype automatically, creating a structured array:

In [170]: np.genfromtxt(txt.splitlines(),delimiter='\t',dtype=None)
Out[170]: 
array([(b'Y7C', -0.22, -0.12, -0.29, -0.51, -0.81),
       (b'Y7C', -0.22, -0.12, -0.29, -0.51, -0.81),
       (b'Y7C', -0.22, -0.12, -0.29, -0.51, -0.81)], 
      dtype=[('f0', 'S3'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8')])

Spell out the columns to use, skipping the first:

In [172]: np.genfromtxt(txt.splitlines(),delimiter='\t',usecols=np.arange(1,6))
Out[172]: 
array([[-0.22, -0.12, -0.29, -0.51, -0.81],
       [-0.22, -0.12, -0.29, -0.51, -0.81],
       [-0.22, -0.12, -0.29, -0.51, -0.81]])

But if I ask for more columns that it finds I get an error, like yours:

In [173]: np.genfromtxt(txt.splitlines(),delimiter='\t',usecols=np.arange(1,7))
---------------------------------------------------------------------------
.... 
ValueError: Some errors were detected !
    Line #1 (got 6 columns instead of 6)
    Line #2 (got 6 columns instead of 6)
    Line #3 (got 6 columns instead of 6)

Your missing_values parameters doesn't help; that's the wrong use for that

This is the correct use of missing_values - to detect the string value and replace it with a valid float value:

In [177]: np.genfromtxt(txt.splitlines(),delimiter='\t',missing_values='Y7C',filling_val
     ...: ues=0)
Out[177]: 
array([[ 0.  , -0.22, -0.12, -0.29, -0.51, -0.81],
       [ 0.  , -0.22, -0.12, -0.29, -0.51, -0.81],
       [ 0.  , -0.22, -0.12, -0.29, -0.51, -0.81]])

If the file has sufficient delimiters, it can treat those as missing values

In [178]: txt = b"""Y7C\t-0.22\t-0.12\t-0.29\t-0.51\t-0.81\t\t 
     ...: Y7C\t-0.22\t-0.12\t-0.29\t-0.51\t-0.81\t\t 
     ...: Y7C\t-0.22\t-0.12\t-0.29\t-0.51\t-0.81\t\t  
     ...: """In [179]: np.genfromtxt(txt.splitlines(),delimiter='\t')
Out[179]: 
array([[  nan, -0.22, -0.12, -0.29, -0.51, -0.81,   nan,   nan],
       [  nan, -0.22, -0.12, -0.29, -0.51, -0.81,   nan,   nan],
       [  nan, -0.22, -0.12, -0.29, -0.51, -0.81,   nan,   nan]])
In [180]: np.genfromtxt(txt.splitlines(),delimiter='\t',filling_values=0)
Out[180]: 
array([[ 0.  , -0.22, -0.12, -0.29, -0.51, -0.81,  0.  ,  0.  ],
       [ 0.  , -0.22, -0.12, -0.29, -0.51, -0.81,  0.  ,  0.  ],
       [ 0.  , -0.22, -0.12, -0.29, -0.51, -0.81,  0.  ,  0.  ]])

I believe the pandas csv reader can handle 'ragged' columns and missing values better.

Solution 2:

Evidently the program does not like the fact that you have missing values, probably because you're generating a matrix, so it doesn't like replacing missing values with Nans. Try adding 0's in the places with missing values, or at least the tab delimiter so that it will register as having all 174 columns.

Post a Comment for "Loading My Data In Numpy Genfromtxt Get Errors"