libpyhat package

Subpackages

Submodules

libpyhat.spectral_data module

class libpyhat.spectral_data.SpectralData(df, name=None, meta_label='meta', spect_label='wvl', comp_label='comp', geodata=None)[source]

Bases: object

This class is the native object used to store spectral data in PyHAT. Image cubes, point spectra, etc. will be translated into this object and this object will be passed around to PyHAT functionalities. Where necessary, those functionalities will translate the class into the necessary formats for scikit-learn and other packages or functions according to their respective API/interfacing requirements.

Parameters:: object : a pandas dataframe that has a particular multi-index structure

Notes: The structure of the pandas dataframe required by this class is as follows:

|meta|meta|…|wvl|wvl|…|comp|comp|… |metadata_category|another_metadata_category ...|wavelength_value|wavelength_value|…|composition_category |composition_category|…}

0 |val|val|…|val|val|…|val|val|… 1 |val|val|…|val|val|…|val|val|… … N |val|val|…|val|val|…|val|val|…

Metadata categories can be strings, floats, ints, and have no expected or enforced datatypes. However, common practice is that these categories are strings, “target_name”, “latitude [degrees]”, etc.

An attempt will be made to convert all level-two header values to floats. This process is expected to fail for non-numerical strings, like most metadata and composition categories. However, wavelength values are expected to be ints or floats. Failure in this particular conversion, such as if special characters are included, e.g. ‘<125’, for intensities at wavelengths less than 125 wavelength units, will result in an error. The rename column functionality in PyHAT can help the user address this after the class is instantiated.

Composition category names can also be strings, floats, ints, and have no expected or enforced datatypes. However, common practice is that these categories are strings, such as “MnO [ppm]” and “Olivine [wt%]”.

Spectral intensities are expected to be numeric and an attempt to convert them to float will be made. Failure of this process will generate a warning message. This can happen when non-numeric value or non-numeric string was present, such as when the intensity is reads, ‘<12.5’ or ‘~12.5’. The user can use class features to convert these intensities to numerical values.

To-do:
Introduce the class feature to convert composition or spectral intensity values to a numerical value of the user’s choice, e.g ConvertLessThanToValue(data, value=’0’).

To-do:
Introduce this functionality: The indexes in the first column can be provided by the user, but if missing, will be assigned. They will be enforced to start from 0 and count up to the number of spectra, N.

To-do:
Explicit handling of identical combinations of 1st and 2nd level columns, whether tuples or otherwise, e.g. (‘meta’,’target_type’) and (‘meta’, ‘target_type’) both being in the same dataset.

To-do: We need to handle the case where the columns are not tuples, nor in the native format.

Spectra datasets do not need all three expected top-level column headers (‘wvl’, ‘meta’, ‘comp’), but one of them needs to be present. If none are present, an exception is thrown and interrupt class instantiation.

If the user provides data for a column type that is not in the expected list, this data will be dropped.

The user has the ability to set the required top-level columns (see __init__ args), however certain PyHAT functionalities expect the presence of certain columns.

cal_tran(A, B, dataAmatchcol, dataBmatchcol, params, Aname, Bname)[source]

cal_tran_cv(B, dataAmatchcol, dataBmatchcol, paramgrid, Bname)[source]

closest_wvl(input_wvls)[source]

cluster(col, method, params, kws)[source]

combine_spectral_data(data2)[source]

copy_spectral_data(new_name)[source]

deriv()[source]

dim_red(col, method, params, kws, load_fit, ycol=None)[source]

endmember_identify(col, method, n_endmembers)[source]

enumerate_duplicates(col)[source]

get_wvls()[source]

interp(xnew)[source]

lookup(lookupdata, left_on, right_on)[source]

mask(maskfile, maskvar)[source]

multiply_vector(vectorfile)[source]

norm(ranges, col_var)[source]

outlier_identify(col, method, params)[source]

peak_area(peaks_mins_file)[source]

random_folds(nfolds)[source]

remove_baseline(method, segment, params)[source]

remove_duplicates()[source]

remove_empty_spectra()[source]

remove_rows(matching_values)[source]

remove_unnamed()[source]

scale(df_to_fit=None)[source]

shift(shift)[source]

stratified_folds(nfolds, col, tiebreaker, comp_label='comp')[source]

unmix(endmembers_df, method, params, normalize)[source]

libpyhat package

Subpackages

Submodules

libpyhat.spectral_data module

Module contents