|
Cosan
1.0
Data Analytics Library
|
Raw Data container. More...
#include <CosanData.h>
Public Member Functions | |
| CosanRawData ()=default | |
| CosanRawData (const std::string &srcX, const std::string &srcY) | |
| Constructor: Read data X and Y from csv files and form raw data container. More... | |
| CosanRawData (const std::string &srcX) | |
| Constructor: Read data X from csv and form raw data container. More... | |
| void | SetInput (const std::string &srcX) |
| Update input X from csv file. More... | |
| void | SetTarget (const std::string &srcY) |
| Update target Y from csv file. More... | |
| void | ConcatenateData (const CosanMatrix< NumericType > &inputX) |
| Concatenate X using CosanMatrix<NumericType> input X. Add new columns. More... | |
| void | UpdateData (const CosanMatrix< NumericType > &inputX) |
| Update X using CosanMatrix<NumericType> input X. More... | |
| void | UpdateData (const CosanMatrix< NumericType > &inputX, const CosanMatrix< NumericType > &inputY) |
| Update X and Y using CosanMatrix<NumericType> inputX,inputY. More... | |
| void | UpdateCat (const std::vector< std::string > &inputX) |
| Update categorical vector svaluesX using std::vector<std::string> & inputX. More... | |
| void | UpdateCat (const std::vector< std::string > &inputX, const std::vector< std::string > &inputY) |
| Update categorical vector svaluesX,svaluesY using std::vector<std::string> & inputX,inputY. More... | |
| CosanMatrix< NumericType > | GetInput () |
| Get a copy of CosanMatrix<NumericType> X. More... | |
| CosanMatrix< NumericType > | GetTarget () |
| Get a copy of CosanMatrix<NumericType> Y. More... | |
| const CosanMatrix< NumericType > & | GetInput () const |
| Get a const reference to const CosanMatrix<NumericType> X. More... | |
| const CosanMatrix< NumericType > & | GetTarget () const |
| Get a const reference to const CosanMatrix<NumericType> Y. More... | |
| std::tuple< gsl::index, gsl::index > | GetMissingNumber () |
| Get the total number data information. More... | |
| virtual const std::string | GetName () const |
| Get the name of the objects. More... | |
| const std::string & | GetSummaryMessageX () const |
| Get the summary message on reading csv file on X. More... | |
| const std::string & | GetSummaryMessageY () const |
| Get the summary message on reading csv file on Y. More... | |
| std::unordered_map< gsl::index, gsl::index > & | GetRawToNumIdx () |
| Raw data column index to numeric data matrix X column index. More... | |
| std::unordered_map< gsl::index, gsl::index > & | GetRawToCatIdx () |
| Raw data column index to categorical data column index. More... | |
| std::vector< std::vector< gsl::index > > | GetIdxpinfX () const |
| Get the position of positive infinity in the origin data X. More... | |
| std::vector< std::vector< gsl::index > > | GetIdxminfX () const |
| Get the position of negative infinity in the origin data X. More... | |
| std::vector< std::vector< gsl::index > > | GetIdxmissingX () const |
| Get the position of missing in the origin data X. More... | |
| std::vector< std::vector< gsl::index > > | GetIdxpinfY () const |
| Get the position of positive infinity in the origin data Y. More... | |
| std::vector< std::vector< gsl::index > > | GetIdxminfY () const |
| Get the position of negative infinity in the origin data Y. More... | |
| std::vector< std::vector< gsl::index > > | GetIdxmissingY () const |
| Get the position of missing in the origin data Y. More... | |
| std::set< gsl::index > | GetcolCatX () const |
| Get the column index (in the origin X of csv file) where the column is of categorical type. More... | |
| std::set< gsl::index > | GetcolCatY () const |
| Get the column index (in the origin Y of csv file) where the column is of categorical type. More... | |
| bool | GetcatY () const |
| True if Y is categorical data type. False otherwise. More... | |
| gsl::index | GetrowsX () |
| Get the number of rows for X. More... | |
| gsl::index | GetrowsY () |
| Get the number of rows for Y. More... | |
| gsl::index | GetcolsX () |
| Get the number of columns for X. More... | |
| gsl::index | GetcolsY () |
| Get the number of columns for Y. More... | |
| std::vector< std::string > | GetsvaluesX () const |
| Get the vector of categorical data from X. order: row first. More... | |
| std::vector< std::string > | GetsvaluesY () const |
| Get the vector of categorical data from Y. order: row first. More... | |
| CosanMatrix< NumericType > | GetType () |
Public Member Functions inherited from Cosan::CosanBO | |
| CosanBO () | |
| Default constructor. More... | |
Protected Attributes | |
| CosanMatrix< NumericType > | X |
| Numeric data from origin CSV file for X. More... | |
| CosanMatrix< NumericType > | Y |
| Numeric data from origin CSV file for Y. More... | |
| CosanMatrix< NumericType > | __TYPE |
| std::string | SummaryMessageX |
| Loading message. More... | |
| std::string | SummaryMessageY |
| std::vector< std::vector< gsl::index > > | IdxpinfX |
| position for positive, negative infinity and missing values. More... | |
| std::vector< std::vector< gsl::index > > | IdxminfX |
| std::vector< std::vector< gsl::index > > | IdxmissingX |
| std::vector< std::vector< gsl::index > > | IdxpinfY |
| std::vector< std::vector< gsl::index > > | IdxminfY |
| std::vector< std::vector< gsl::index > > | IdxmissingY |
| std::set< gsl::index > | colCatX |
| column idx in the origin data that is categorical data. More... | |
| std::set< gsl::index > | colCatY |
| bool | catY = false |
| true mean respone variable Y is categorical data. More... | |
| gsl::index | rowsX = 0 |
| number of rows. More... | |
| gsl::index | colsX = 0 |
| gsl::index | rowsY = 0 |
| number of columns More... | |
| gsl::index | colsY = 0 |
| std::vector< std::string > | svaluesX |
| std::vector< std::string > | svaluesY |
Private Member Functions | |
| std::tuple< gsl::index, gsl::index, std::string > | _load_csv (const std::string &path, CosanMatrix< NumericType > &X, std::vector< std::vector< gsl::index >> &Idxpinf, std::vector< std::vector< gsl::index >> &Idxminf, std::vector< std::vector< gsl::index >> &Idxmissing, std::vector< std::string > &svalues, std::set< gsl::index > &colCat) |
| load data from csv More... | |
Private Attributes | |
| std::unordered_map< gsl::index, gsl::index > | _raw2numIdx |
| std::unordered_map< gsl::index, gsl::index > | _raw2catIdx |
Raw Data container.
Every constructor needs to have at least one input. To obtain CosanRawData, three constructors can be used:
Definition at line 36 of file CosanData.h.
|
default |
|
inline |
Constructor: Read data X and Y from csv files and form raw data container.
| [in] | srcX | path to the csv file of data X; |
| [in] | srcY | path to the csv file of data Y. |
Definition at line 45 of file CosanData.h.
|
inline |
Constructor: Read data X from csv and form raw data container.
| [in] | srcX | path to the csv file of data X. |
Definition at line 59 of file CosanData.h.
|
inlineprivate |
load data from csv
We accept data file in csv format where each data is of dimension n\times p where n (the number of rows) is number of samples and p (the number of columns) denotes number of features. Each data entry is separated by "," and allows for positive/negative infinity (user-specific NumericType is float,double or long double), missing values (either emptry entry between two contiguous comma "," or NAN expression ) and non-number string. If user-specific NumericType is float,double or long double, acceptable numeric expressions also include hexadecimal and variants of decimal float-poing expression (see this for more details). It will throw std::invalid_argument if the the entry read is not-a-number expression except this entry is of categorical type.
We determine each column's data type (either numeric or categorical) by the first row. We treat every entry as numeric if it is a number (whether it is ordinal or numerical) and treat every entry that does not start with a numeric as categorical (also called nominal data specifically). For those starting with a numeric but containing non-numeric character, std::invalid_argument will be thrown.
Definition at line 358 of file CosanData.h.
|
inline |
Concatenate X using CosanMatrix<NumericType> input X. Add new columns.
| [in] | const | CosanMatrix<NumericType>& inputX |
Add new columns.
Definition at line 94 of file CosanData.h.
|
inline |
True if Y is categorical data type. False otherwise.
Definition at line 248 of file CosanData.h.
|
inline |
Get the column index (in the origin X of csv file) where the column is of categorical type.
Definition at line 237 of file CosanData.h.
|
inline |
Get the column index (in the origin Y of csv file) where the column is of categorical type.
Definition at line 243 of file CosanData.h.
|
inline |
|
inline |
|
inline |
Get the position of negative infinity in the origin data X.
Definition at line 212 of file CosanData.h.
|
inline |
Get the position of negative infinity in the origin data Y.
Definition at line 227 of file CosanData.h.
|
inline |
Get the position of missing in the origin data X.
Definition at line 217 of file CosanData.h.
|
inline |
Get the position of missing in the origin data Y.
Definition at line 232 of file CosanData.h.
|
inline |
Get the position of positive infinity in the origin data X.
Definition at line 207 of file CosanData.h.
|
inline |
Get the position of positive infinity in the origin data Y.
Definition at line 222 of file CosanData.h.
|
inline |
Get a copy of CosanMatrix<NumericType> X.
Definition at line 141 of file CosanData.h.
|
inline |
Get a const reference to const CosanMatrix<NumericType> X.
Definition at line 153 of file CosanData.h.
|
inline |
Get the total number data information.
Definition at line 166 of file CosanData.h.
|
inlinevirtual |
Get the name of the objects.
Reimplemented from Cosan::CosanBO.
Reimplemented in Cosan::CosanData< NumericType >.
Definition at line 176 of file CosanData.h.
|
inline |
Raw data column index to categorical data column index.
Definition at line 202 of file CosanData.h.
|
inline |
Raw data column index to numeric data matrix X column index.
Definition at line 197 of file CosanData.h.
|
inline |
|
inline |
|
inline |
Get the summary message on reading csv file on X.
Definition at line 187 of file CosanData.h.
|
inline |
Get the summary message on reading csv file on Y.
Definition at line 192 of file CosanData.h.
|
inline |
Get the vector of categorical data from X. order: row first.
Definition at line 287 of file CosanData.h.
|
inline |
Get the vector of categorical data from Y. order: row first.
Definition at line 293 of file CosanData.h.
|
inline |
Get a copy of CosanMatrix<NumericType> Y.
Definition at line 147 of file CosanData.h.
|
inline |
Get a const reference to const CosanMatrix<NumericType> Y.
Definition at line 159 of file CosanData.h.
|
inline |
Definition at line 298 of file CosanData.h.
|
inline |
Update input X from csv file.
| [in] | srcX | path to the csv file of data X. |
Definition at line 74 of file CosanData.h.
|
inline |
Update target Y from csv file.
| [in] | srcY | path to the csv file of data Y. |
Definition at line 82 of file CosanData.h.
|
inline |
Update categorical vector svaluesX using std::vector<std::string> & inputX.
| [in] | const | std::vector<std::string> & inputX |
Definition at line 125 of file CosanData.h.
|
inline |
Update categorical vector svaluesX,svaluesY using std::vector<std::string> & inputX,inputY.
| [in] | const | std::vector<std::string> & inputX,inputY |
Definition at line 133 of file CosanData.h.
|
inline |
Update X using CosanMatrix<NumericType> input X.
| [in] | const | CosanMatrix<NumericType>& inputX |
Definition at line 108 of file CosanData.h.
|
inline |
Update X and Y using CosanMatrix<NumericType> inputX,inputY.
| [in] | const | CosanMatrix<NumericType>& inputX,const CosanMatrix<NumericType>& inputY |
Definition at line 116 of file CosanData.h.
|
protected |
Definition at line 308 of file CosanData.h.
|
private |
Definition at line 339 of file CosanData.h.
|
private |
Definition at line 339 of file CosanData.h.
|
protected |
true mean respone variable Y is categorical data.
Definition at line 325 of file CosanData.h.
|
protected |
column idx in the origin data that is categorical data.
Definition at line 321 of file CosanData.h.
|
protected |
Definition at line 321 of file CosanData.h.
|
protected |
Definition at line 329 of file CosanData.h.
|
protected |
Definition at line 333 of file CosanData.h.
|
protected |
Definition at line 316 of file CosanData.h.
|
protected |
Definition at line 317 of file CosanData.h.
|
protected |
Definition at line 316 of file CosanData.h.
|
protected |
Definition at line 317 of file CosanData.h.
|
protected |
position for positive, negative infinity and missing values.
Definition at line 316 of file CosanData.h.
|
protected |
Definition at line 317 of file CosanData.h.
|
protected |
number of rows.
Definition at line 329 of file CosanData.h.
|
protected |
number of columns
Definition at line 333 of file CosanData.h.
|
protected |
Loading message.
Definition at line 312 of file CosanData.h.
|
protected |
Definition at line 312 of file CosanData.h.
|
protected |
Get the vector of categorical data from original data. order: row first.
Definition at line 337 of file CosanData.h.
|
protected |
Definition at line 337 of file CosanData.h.
|
protected |
Numeric data from origin CSV file for X.
Definition at line 303 of file CosanData.h.
|
protected |
Numeric data from origin CSV file for Y.
Definition at line 307 of file CosanData.h.