Loading My Data In Numpy Genfromtxt Get Errors
Solution 1:
A short sample bytestring substitute for a file:
In [168]: txt = b"""Y7C\t-0.22\t-0.12\t-0.29\t-0.51\t-0.81
...: Y7C\t-0.22\t-0.12\t-0.29\t-0.51\t-0.81
...: Y7C\t-0.22\t-0.12\t-0.29\t-0.51\t-0.81
...: """
Minimal load with correct delimiter. Note the first column is nan
, because it can't convert the strings to float.
In [169]: np.genfromtxt(txt.splitlines(),delimiter='\t')
Out[169]:
array([[ nan, -0.22, -0.12, -0.29, -0.51, -0.81],
[ nan, -0.22, -0.12, -0.29, -0.51, -0.81],
[ nan, -0.22, -0.12, -0.29, -0.51, -0.81]])
with dtype=None it sets each column dtype automatically, creating a structured array:
In [170]: np.genfromtxt(txt.splitlines(),delimiter='\t',dtype=None)
Out[170]:
array([(b'Y7C', -0.22, -0.12, -0.29, -0.51, -0.81),
(b'Y7C', -0.22, -0.12, -0.29, -0.51, -0.81),
(b'Y7C', -0.22, -0.12, -0.29, -0.51, -0.81)],
dtype=[('f0', 'S3'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8')])
Spell out the columns to use, skipping the first:
In [172]: np.genfromtxt(txt.splitlines(),delimiter='\t',usecols=np.arange(1,6))
Out[172]:
array([[-0.22, -0.12, -0.29, -0.51, -0.81],
[-0.22, -0.12, -0.29, -0.51, -0.81],
[-0.22, -0.12, -0.29, -0.51, -0.81]])
But if I ask for more columns that it finds I get an error, like yours:
In [173]: np.genfromtxt(txt.splitlines(),delimiter='\t',usecols=np.arange(1,7))
---------------------------------------------------------------------------
....
ValueError: Some errors were detected !
Line #1 (got 6 columns instead of 6)
Line #2 (got 6 columns instead of 6)
Line #3 (got 6 columns instead of 6)
Your missing_values
parameters doesn't help; that's the wrong use for that
This is the correct use of missing_values
- to detect the string value and replace it with a valid float value:
In [177]: np.genfromtxt(txt.splitlines(),delimiter='\t',missing_values='Y7C',filling_val
...: ues=0)
Out[177]:
array([[ 0. , -0.22, -0.12, -0.29, -0.51, -0.81],
[ 0. , -0.22, -0.12, -0.29, -0.51, -0.81],
[ 0. , -0.22, -0.12, -0.29, -0.51, -0.81]])
If the file has sufficient delimiters, it can treat those as missing values
In [178]: txt = b"""Y7C\t-0.22\t-0.12\t-0.29\t-0.51\t-0.81\t\t
...: Y7C\t-0.22\t-0.12\t-0.29\t-0.51\t-0.81\t\t
...: Y7C\t-0.22\t-0.12\t-0.29\t-0.51\t-0.81\t\t
...: """In [179]: np.genfromtxt(txt.splitlines(),delimiter='\t')
Out[179]:
array([[ nan, -0.22, -0.12, -0.29, -0.51, -0.81, nan, nan],
[ nan, -0.22, -0.12, -0.29, -0.51, -0.81, nan, nan],
[ nan, -0.22, -0.12, -0.29, -0.51, -0.81, nan, nan]])
In [180]: np.genfromtxt(txt.splitlines(),delimiter='\t',filling_values=0)
Out[180]:
array([[ 0. , -0.22, -0.12, -0.29, -0.51, -0.81, 0. , 0. ],
[ 0. , -0.22, -0.12, -0.29, -0.51, -0.81, 0. , 0. ],
[ 0. , -0.22, -0.12, -0.29, -0.51, -0.81, 0. , 0. ]])
I believe the pandas
csv reader can handle 'ragged' columns and missing values better.
Solution 2:
Evidently the program does not like the fact that you have missing values, probably because you're generating a matrix, so it doesn't like replacing missing values with Nans. Try adding 0's in the places with missing values, or at least the tab delimiter so that it will register as having all 174 columns.
Post a Comment for "Loading My Data In Numpy Genfromtxt Get Errors"