Cosan  1.0
Data Analytics Library
Cosan::CosanRawData< NumericType > Class Template Reference

Raw Data container. More...

#include <CosanData.h>

Inheritance diagram for Cosan::CosanRawData< NumericType >:
Cosan::CosanBO Cosan::CosanData< NumericType >

Public Member Functions

 CosanRawData ()=default
 
 CosanRawData (const std::string &srcX, const std::string &srcY)
 Constructor: Read data X and Y from csv files and form raw data container. More...
 
 CosanRawData (const std::string &srcX)
 Constructor: Read data X from csv and form raw data container. More...
 
void SetInput (const std::string &srcX)
 Update input X from csv file. More...
 
void SetTarget (const std::string &srcY)
 Update target Y from csv file. More...
 
void ConcatenateData (const CosanMatrix< NumericType > &inputX)
 Concatenate X using CosanMatrix<NumericType> input X. Add new columns. More...
 
void UpdateData (const CosanMatrix< NumericType > &inputX)
 Update X using CosanMatrix<NumericType> input X. More...
 
void UpdateData (const CosanMatrix< NumericType > &inputX, const CosanMatrix< NumericType > &inputY)
 Update X and Y using CosanMatrix<NumericType> inputX,inputY. More...
 
void UpdateCat (const std::vector< std::string > &inputX)
 Update categorical vector svaluesX using std::vector<std::string> & inputX. More...
 
void UpdateCat (const std::vector< std::string > &inputX, const std::vector< std::string > &inputY)
 Update categorical vector svaluesX,svaluesY using std::vector<std::string> & inputX,inputY. More...
 
CosanMatrix< NumericTypeGetInput ()
 Get a copy of CosanMatrix<NumericType> X. More...
 
CosanMatrix< NumericTypeGetTarget ()
 Get a copy of CosanMatrix<NumericType> Y. More...
 
const CosanMatrix< NumericType > & GetInput () const
 Get a const reference to const CosanMatrix<NumericType> X. More...
 
const CosanMatrix< NumericType > & GetTarget () const
 Get a const reference to const CosanMatrix<NumericType> Y. More...
 
std::tuple< gsl::index, gsl::index > GetMissingNumber ()
 Get the total number data information. More...
 
virtual const std::string GetName () const
 Get the name of the objects. More...
 
const std::string & GetSummaryMessageX () const
 Get the summary message on reading csv file on X. More...
 
const std::string & GetSummaryMessageY () const
 Get the summary message on reading csv file on Y. More...
 
std::unordered_map< gsl::index, gsl::index > & GetRawToNumIdx ()
 Raw data column index to numeric data matrix X column index. More...
 
std::unordered_map< gsl::index, gsl::index > & GetRawToCatIdx ()
 Raw data column index to categorical data column index. More...
 
std::vector< std::vector< gsl::index > > GetIdxpinfX () const
 Get the position of positive infinity in the origin data X. More...
 
std::vector< std::vector< gsl::index > > GetIdxminfX () const
 Get the position of negative infinity in the origin data X. More...
 
std::vector< std::vector< gsl::index > > GetIdxmissingX () const
 Get the position of missing in the origin data X. More...
 
std::vector< std::vector< gsl::index > > GetIdxpinfY () const
 Get the position of positive infinity in the origin data Y. More...
 
std::vector< std::vector< gsl::index > > GetIdxminfY () const
 Get the position of negative infinity in the origin data Y. More...
 
std::vector< std::vector< gsl::index > > GetIdxmissingY () const
 Get the position of missing in the origin data Y. More...
 
std::set< gsl::index > GetcolCatX () const
 Get the column index (in the origin X of csv file) where the column is of categorical type. More...
 
std::set< gsl::index > GetcolCatY () const
 Get the column index (in the origin Y of csv file) where the column is of categorical type. More...
 
bool GetcatY () const
 True if Y is categorical data type. False otherwise. More...
 
gsl::index GetrowsX ()
 Get the number of rows for X. More...
 
gsl::index GetrowsY ()
 Get the number of rows for Y. More...
 
gsl::index GetcolsX ()
 Get the number of columns for X. More...
 
gsl::index GetcolsY ()
 Get the number of columns for Y. More...
 
std::vector< std::string > GetsvaluesX () const
 Get the vector of categorical data from X. order: row first. More...
 
std::vector< std::string > GetsvaluesY () const
 Get the vector of categorical data from Y. order: row first. More...
 
CosanMatrix< NumericTypeGetType ()
 
- Public Member Functions inherited from Cosan::CosanBO
 CosanBO ()
 Default constructor. More...
 

Protected Attributes

CosanMatrix< NumericTypeX
 Numeric data from origin CSV file for X. More...
 
CosanMatrix< NumericTypeY
 Numeric data from origin CSV file for Y. More...
 
CosanMatrix< NumericType__TYPE
 
std::string SummaryMessageX
 Loading message. More...
 
std::string SummaryMessageY
 
std::vector< std::vector< gsl::index > > IdxpinfX
 position for positive, negative infinity and missing values. More...
 
std::vector< std::vector< gsl::index > > IdxminfX
 
std::vector< std::vector< gsl::index > > IdxmissingX
 
std::vector< std::vector< gsl::index > > IdxpinfY
 
std::vector< std::vector< gsl::index > > IdxminfY
 
std::vector< std::vector< gsl::index > > IdxmissingY
 
std::set< gsl::index > colCatX
 column idx in the origin data that is categorical data. More...
 
std::set< gsl::index > colCatY
 
bool catY = false
 true mean respone variable Y is categorical data. More...
 
gsl::index rowsX = 0
 number of rows. More...
 
gsl::index colsX = 0
 
gsl::index rowsY = 0
 number of columns More...
 
gsl::index colsY = 0
 
std::vector< std::string > svaluesX
 
std::vector< std::string > svaluesY
 

Private Member Functions

std::tuple< gsl::index, gsl::index, std::string > _load_csv (const std::string &path, CosanMatrix< NumericType > &X, std::vector< std::vector< gsl::index >> &Idxpinf, std::vector< std::vector< gsl::index >> &Idxminf, std::vector< std::vector< gsl::index >> &Idxmissing, std::vector< std::string > &svalues, std::set< gsl::index > &colCat)
 load data from csv More...
 

Private Attributes

std::unordered_map< gsl::index, gsl::index > _raw2numIdx
 
std::unordered_map< gsl::index, gsl::index > _raw2catIdx
 

Detailed Description

template<Numeric NumericType>
class Cosan::CosanRawData< NumericType >

Raw Data container.

Every constructor needs to have at least one input. To obtain CosanRawData, three constructors can be used:

CosanRawData(const std::string & srcX);
CosanRawData(const std::string & srcX, const std::string & srcY);

Definition at line 36 of file CosanData.h.

Constructor & Destructor Documentation

◆ CosanRawData() [1/3]

template<Numeric NumericType>
Cosan::CosanRawData< NumericType >::CosanRawData ( )
default

◆ CosanRawData() [2/3]

template<Numeric NumericType>
Cosan::CosanRawData< NumericType >::CosanRawData ( const std::string &  srcX,
const std::string &  srcY 
)
inline

Constructor: Read data X and Y from csv files and form raw data container.

Parameters
[in]srcXpath to the csv file of data X;
[in]srcYpath to the csv file of data Y.
Note
X and Y are from two separate csv files. For the data format of csv files, see Tutorial.

Definition at line 45 of file CosanData.h.

45  :CosanBO(){
46 // static_assert(std::is_arithmetic<NumericType>::value, "NumericType must be numeric");
47  if (std::is_same_v<NumericType, bool>){
48  throw std::invalid_argument(
49  "We do not accept bool at this moment. Try unsigned int, unsigned long, unsigned long long, int, "
50  "long, long, float, double ,long double.");
51  }
52  SetInput(srcX);
53  SetTarget(srcY);
54  }

◆ CosanRawData() [3/3]

template<Numeric NumericType>
Cosan::CosanRawData< NumericType >::CosanRawData ( const std::string &  srcX)
inline

Constructor: Read data X from csv and form raw data container.

Parameters
[in]srcXpath to the csv file of data X.

Definition at line 59 of file CosanData.h.

59  :CosanBO(){
60 // static_assert(std::is_arithmetic<NumericType>::value, "NumericType must be numeric");
61  if (std::is_same_v<NumericType, bool>){
62  throw std::invalid_argument(
63  "We do not accept bool at this moment. Try unsigned int, unsigned long, unsigned long long, int, "
64  "long, long, float, double ,long double.");
65  }
66  SetInput(srcX);}

Member Function Documentation

◆ _load_csv()

template<Numeric NumericType>
std::tuple<gsl::index,gsl::index,std::string> Cosan::CosanRawData< NumericType >::_load_csv ( const std::string &  path,
CosanMatrix< NumericType > &  X,
std::vector< std::vector< gsl::index >> &  Idxpinf,
std::vector< std::vector< gsl::index >> &  Idxminf,
std::vector< std::vector< gsl::index >> &  Idxmissing,
std::vector< std::string > &  svalues,
std::set< gsl::index > &  colCat 
)
inlineprivate

load data from csv

We accept data file in csv format where each data is of dimension n\times p where n (the number of rows) is number of samples and p (the number of columns) denotes number of features. Each data entry is separated by "," and allows for positive/negative infinity (user-specific NumericType is float,double or long double), missing values (either emptry entry between two contiguous comma "," or NAN expression ) and non-number string. If user-specific NumericType is float,double or long double, acceptable numeric expressions also include hexadecimal and variants of decimal float-poing expression (see this for more details). It will throw std::invalid_argument if the the entry read is not-a-number expression except this entry is of categorical type.

We determine each column's data type (either numeric or categorical) by the first row. We treat every entry as numeric if it is a number (whether it is ordinal or numerical) and treat every entry that does not start with a numeric as categorical (also called nominal data specifically). For those starting with a numeric but containing non-numeric character, std::invalid_argument will be thrown.

Definition at line 358 of file CosanData.h.

360  {
361 
362  std::ifstream indata;
363  indata.open(path);
364  std::string line;
365  std::vector<NumericType> values;
366  // std::vector<std::string> svalues;
367  gsl::index rows = 0,cols = 0,col_idx=0;
368  // uint col_idx=0;
369  // std::vector<std::vector<uint>> Idxpinf,Idxminf,Idxmissing;
370  // std::set<uint> colCat;
371  NumericType result;
372  std::size_t pos;
373  std::string SummaryMessage;
374 
375  // stod -> "-23","-12E1","-+nan" (is a double type), "+\- inf","+\- infinity", "jklsgfd","1235lkjfg",
376 // first row: empty/nonempty: empty-> numerical nan,
377  // not empty-> can read numeric -> it is indeed a correct numeric format
378 // -> wrong format i.e. "1235lkjfg" throw error
379  // -> cannot read numeric-> then consider this as category
380 
381 // then set colCat, cols,
382  // numeric-format,
383  //
384  std::getline(indata, line);
385  std::stringstream lineStream(line);
386  std::string cell;
387  while(getline(lineStream, cell, ',')) {
388  if (cell.size()==0){
389  values.push_back(StringToNum<NumericType>(std::string("nan")));
390  Idxmissing.push_back(std::vector<gsl::index>({rows,col_idx}));
391  col_idx++;
392  cols=std::max(cols,col_idx);
393  continue;
394  }
395  try{
396  result = StringToNum<NumericType>(cell, &pos);
397  }catch(...){
398  svalues.push_back(cell);
399  colCat.insert(col_idx);
400  col_idx++;
401  cols=std::max(cols,col_idx);
402  continue;
403  }
404  if (pos!=cell.size()){
405  throw std::invalid_argument(
406  "Incorrect numeric format! Abort the program. The entry reads "+cell+
407  " and the position is ("+ std::to_string(rows)+","+ std::to_string(col_idx)+")");
408  }
409  values.push_back(result);
410  if (isinf(values.back())){
411  if (values.back()==std::numeric_limits<NumericType>::infinity()){
412  Idxpinf.push_back(std::vector<gsl::index>({rows,col_idx}));}
413  else {Idxminf.push_back(std::vector<gsl::index>({rows,col_idx}));}
414  }
415  else if (isnan(values.back())){
416  Idxmissing.push_back(std::vector<gsl::index>({rows,col_idx}));
417  }
418  col_idx++;
419  cols=std::max(cols,col_idx);
420  }
421  rows = 1;
422  col_idx = 0;
423 
424 
425  while (std::getline(indata, line)) {
426  // std::stringstream lineStream(line);
427  // std::string cell;
428  lineStream.str("");
429  lineStream.clear(); // Clear state flags.
430  lineStream<<line;
431  while(getline(lineStream, cell, ',')) {
432  if (cell.size()==0){
433  if (colCat.find(col_idx)==colCat.end()){
434  values.push_back(StringToNum<NumericType>(std::string("nan")));
435  }
436  else{
437  svalues.push_back("");
438  }
439  Idxmissing.push_back(std::vector<gsl::index>({rows,col_idx}));
440  col_idx++;
441  continue;
442  }
443  try{
444  result = StringToNum<NumericType>(cell, &pos);
445  }catch(...){
446  if (colCat.find(col_idx)!=colCat.end())
447  {
448  svalues.push_back(cell);
449  colCat.insert(col_idx);
450  col_idx++;
451  continue;}
452  else{
453  throw std::invalid_argument(
454  "Incorrect value type! Should be numeric but non-numeric input. The entry reads "+cell+
455  " and the position is ("+ std::to_string(rows)+","+ std::to_string(col_idx)+")");
456  }
457  }
458  if (pos!=cell.size()){
459  throw std::invalid_argument(
460  "Incorrect numeric format! Abort the program. The entry reads "+cell+
461  " and the position is ("+ std::to_string(rows)+","+ std::to_string(col_idx)+")");
462  }
463  values.push_back(result);
464  if (isinf(values.back())){
465  if (values.back()==std::numeric_limits<NumericType>::infinity()){
466  Idxpinf.push_back(std::vector<gsl::index>({rows,col_idx}));}
467  else {Idxminf.push_back(std::vector<gsl::index>({rows,col_idx}));}
468  }
469  else if (isnan(values.back())){
470  Idxmissing.push_back(std::vector<gsl::index>({rows,col_idx}));
471  }
472  col_idx++;
473  }
474  if (cols!=col_idx){
475  std::cout<<cols<<" "<<col_idx<<std::endl;
476  throw std::invalid_argument("Not all rows has same number of entry! First row has "+std::to_string(cols)+" columns but row "+std::to_string(rows)+" has "+std::to_string(col_idx)+" columns!" );
477  }
478  ++rows;
479  col_idx=0;
480  }
481  X.resize(rows,values.size()/rows);
482  gsl::index i =0,__cols = values.size()/rows;
483  for (auto &each :values ){
484  X(i/__cols,i%__cols) = each;
485  i++;
486  }
487 
488 // Eigen::Map<const CosanMatrix<NumericType>>(values.data(), rows, values.size()/rows);
489 
490  SummaryMessage+="Number of rows: "+std::to_string(rows)+"\n";
491  SummaryMessage+="Number of columns: "+std::to_string(cols)+"\n";
492  SummaryMessage+="Number of positive infinity values: "+std::to_string(Idxpinf.size())+". They are at " ;
493  for(auto each :Idxpinf){
494  SummaryMessage+="("+std::to_string(each[0])+","+std::to_string(each[1])+") ";
495  }
496  SummaryMessage+="\n";
497  SummaryMessage+="Number of negative infinity values: "+std::to_string(Idxminf.size())+". They are at ";
498  for(auto each :Idxminf){
499  SummaryMessage+="("+std::to_string(each[0])+","+std::to_string(each[1])+")"+" ";
500  }
501  SummaryMessage+="\n";
502  SummaryMessage+="Number of missing values: "+std::to_string(Idxmissing.size())+". They are at ";
503  for(auto each :Idxmissing){
504  SummaryMessage+="("+std::to_string(each[0])+","+std::to_string(each[1])+")"+" ";
505  }
506  SummaryMessage+="\n";
507  // for (auto fvalue:values) {std::cout<<fvalue<<std::endl;}
508  SummaryMessage+="Columns of categorical values: Column ";
509  for (auto idx:colCat) {
510  SummaryMessage+=std::to_string(idx)+" ";}
511  SummaryMessage+="\n";
512  gsl::index j = 0 ;
513  for (gsl::index i = 0;i<cols;i++){
514  if (colCat.find(i)==colCat.end()){
515  _raw2numIdx[i] = j;
516  j++;}
517  }
518  j = 0 ;
519  for (gsl::index i = 0;i<cols;i++){
520  if(colCat.find(i)!=colCat.end()){
521  _raw2catIdx[i]=j;
522  j++;
523  }
524  }
525 
526 
527  return {rows,cols,SummaryMessage};
528 
529 
530  }

◆ ConcatenateData()

template<Numeric NumericType>
void Cosan::CosanRawData< NumericType >::ConcatenateData ( const CosanMatrix< NumericType > &  inputX)
inline

Concatenate X using CosanMatrix<NumericType> input X. Add new columns.

Parameters
[in]constCosanMatrix<NumericType>& inputX

Add new columns.

Definition at line 94 of file CosanData.h.

94  {
95  if (GetrowsX()!=inputX.rows()){
96  throw std::invalid_argument(fmt::format("To concatenate, the number of rows from inputX should match with original X. Current nrow of X is {:}", GetrowsX() ));
97  }
98  for (gsl::index i = 0 ;i<inputX.cols();i++){
99  X.conservativeResize(X.rows(), X.cols()+1);
100  X.col(X.cols() - 1) = inputX.col(i);
101  }
102  }

◆ GetcatY()

template<Numeric NumericType>
bool Cosan::CosanRawData< NumericType >::GetcatY ( ) const
inline

True if Y is categorical data type. False otherwise.

Returns
bool

Definition at line 248 of file CosanData.h.

248 {return catY;}

◆ GetcolCatX()

template<Numeric NumericType>
std::set<gsl::index> Cosan::CosanRawData< NumericType >::GetcolCatX ( ) const
inline

Get the column index (in the origin X of csv file) where the column is of categorical type.

Returns
std::set<gsl::index>

Definition at line 237 of file CosanData.h.

237 {return colCatX;}

◆ GetcolCatY()

template<Numeric NumericType>
std::set<gsl::index> Cosan::CosanRawData< NumericType >::GetcolCatY ( ) const
inline

Get the column index (in the origin Y of csv file) where the column is of categorical type.

Returns
std::set<gsl::index>
Note
If Y is one dimension, the size of return should be one. Otherwise it is empty.

Definition at line 243 of file CosanData.h.

243 {return colCatY;}

◆ GetcolsX()

template<Numeric NumericType>
gsl::index Cosan::CosanRawData< NumericType >::GetcolsX ( )
inline

Get the number of columns for X.

Returns
gsl::index

Definition at line 270 of file CosanData.h.

270  {
271  colsX = X.cols();
272  return colsX;
273  }

◆ GetcolsY()

template<Numeric NumericType>
gsl::index Cosan::CosanRawData< NumericType >::GetcolsY ( )
inline

Get the number of columns for Y.

Returns
gsl::index

Definition at line 278 of file CosanData.h.

278  {
279  colsY = Y.cols();
280  return colsY;
281  }

◆ GetIdxminfX()

template<Numeric NumericType>
std::vector<std::vector<gsl::index> > Cosan::CosanRawData< NumericType >::GetIdxminfX ( ) const
inline

Get the position of negative infinity in the origin data X.

Returns
std::unordered_map<gsl::index,gsl::index>

Definition at line 212 of file CosanData.h.

212 {return IdxminfX;}

◆ GetIdxminfY()

template<Numeric NumericType>
std::vector<std::vector<gsl::index> > Cosan::CosanRawData< NumericType >::GetIdxminfY ( ) const
inline

Get the position of negative infinity in the origin data Y.

Returns
std::unordered_map<gsl::index,gsl::index>

Definition at line 227 of file CosanData.h.

227 {return IdxminfY;}

◆ GetIdxmissingX()

template<Numeric NumericType>
std::vector<std::vector<gsl::index> > Cosan::CosanRawData< NumericType >::GetIdxmissingX ( ) const
inline

Get the position of missing in the origin data X.

Returns
std::unordered_map<gsl::index,gsl::index>

Definition at line 217 of file CosanData.h.

217 {return IdxmissingX;}

◆ GetIdxmissingY()

template<Numeric NumericType>
std::vector<std::vector<gsl::index> > Cosan::CosanRawData< NumericType >::GetIdxmissingY ( ) const
inline

Get the position of missing in the origin data Y.

Returns
std::unordered_map<gsl::index,gsl::index>

Definition at line 232 of file CosanData.h.

232 {return IdxmissingY;}

◆ GetIdxpinfX()

template<Numeric NumericType>
std::vector<std::vector<gsl::index> > Cosan::CosanRawData< NumericType >::GetIdxpinfX ( ) const
inline

Get the position of positive infinity in the origin data X.

Returns
std::unordered_map<gsl::index,gsl::index>

Definition at line 207 of file CosanData.h.

207 {return IdxpinfX;}

◆ GetIdxpinfY()

template<Numeric NumericType>
std::vector<std::vector<gsl::index> > Cosan::CosanRawData< NumericType >::GetIdxpinfY ( ) const
inline

Get the position of positive infinity in the origin data Y.

Returns
std::unordered_map<gsl::index,gsl::index>

Definition at line 222 of file CosanData.h.

222 {return IdxpinfY;}

◆ GetInput() [1/2]

template<Numeric NumericType>
CosanMatrix<NumericType> Cosan::CosanRawData< NumericType >::GetInput ( )
inline

Get a copy of CosanMatrix<NumericType> X.

Definition at line 141 of file CosanData.h.

141  {
142  return X;
143  }

◆ GetInput() [2/2]

template<Numeric NumericType>
const CosanMatrix<NumericType>& Cosan::CosanRawData< NumericType >::GetInput ( ) const
inline

Get a const reference to const CosanMatrix<NumericType> X.

Definition at line 153 of file CosanData.h.

153  {
154  return X;
155  }

◆ GetMissingNumber()

template<Numeric NumericType>
std::tuple<gsl::index,gsl::index> Cosan::CosanRawData< NumericType >::GetMissingNumber ( )
inline

Get the total number data information.

Returns
std::tuple<# missing of X,#missing of Y>

Definition at line 166 of file CosanData.h.

166  {
167  return {X.array().isNaN().template cast<NumericType>().sum(),Y.array().isNaN().template cast<NumericType>().sum()};
168  }

◆ GetName()

template<Numeric NumericType>
virtual const std::string Cosan::CosanRawData< NumericType >::GetName ( ) const
inlinevirtual

Get the name of the objects.

Returns
std::string

Reimplemented from Cosan::CosanBO.

Reimplemented in Cosan::CosanData< NumericType >.

Definition at line 176 of file CosanData.h.

176 {return "Raw Data Object.";}

◆ GetRawToCatIdx()

template<Numeric NumericType>
std::unordered_map<gsl::index,gsl::index>& Cosan::CosanRawData< NumericType >::GetRawToCatIdx ( )
inline

Raw data column index to categorical data column index.

Returns
std::unordered_map<gsl::index,gsl::index>

Definition at line 202 of file CosanData.h.

202 {return _raw2catIdx;}

◆ GetRawToNumIdx()

template<Numeric NumericType>
std::unordered_map<gsl::index,gsl::index>& Cosan::CosanRawData< NumericType >::GetRawToNumIdx ( )
inline

Raw data column index to numeric data matrix X column index.

Returns
std::unordered_map<gsl::index,gsl::index>

Definition at line 197 of file CosanData.h.

197 {return _raw2numIdx;}

◆ GetrowsX()

template<Numeric NumericType>
gsl::index Cosan::CosanRawData< NumericType >::GetrowsX ( )
inline

Get the number of rows for X.

Returns
gsl::index

Definition at line 254 of file CosanData.h.

254  {
255  rowsX=X.rows();
256  return rowsX;
257  }

◆ GetrowsY()

template<Numeric NumericType>
gsl::index Cosan::CosanRawData< NumericType >::GetrowsY ( )
inline

Get the number of rows for Y.

Returns
gsl::index

Definition at line 262 of file CosanData.h.

262  {
263  rowsY=Y.rows();
264  return rowsY;
265  }

◆ GetSummaryMessageX()

template<Numeric NumericType>
const std::string& Cosan::CosanRawData< NumericType >::GetSummaryMessageX ( ) const
inline

Get the summary message on reading csv file on X.

Returns
std::string

Definition at line 187 of file CosanData.h.

187 {return SummaryMessageX;}

◆ GetSummaryMessageY()

template<Numeric NumericType>
const std::string& Cosan::CosanRawData< NumericType >::GetSummaryMessageY ( ) const
inline

Get the summary message on reading csv file on Y.

Returns
std::string

Definition at line 192 of file CosanData.h.

192 {return SummaryMessageY;}

◆ GetsvaluesX()

template<Numeric NumericType>
std::vector<std::string> Cosan::CosanRawData< NumericType >::GetsvaluesX ( ) const
inline

Get the vector of categorical data from X. order: row first.

Returns
std::vector<std::string>
Note
it is a std::vector of std::string. Strings are stored row-first.

Definition at line 287 of file CosanData.h.

287 {return svaluesX;}

◆ GetsvaluesY()

template<Numeric NumericType>
std::vector<std::string> Cosan::CosanRawData< NumericType >::GetsvaluesY ( ) const
inline

Get the vector of categorical data from Y. order: row first.

Returns
std::vector<std::string>
Note
it is a std::vector of std::string. Strings are stored row-first.

Definition at line 293 of file CosanData.h.

293 {return svaluesY;}

◆ GetTarget() [1/2]

template<Numeric NumericType>
CosanMatrix<NumericType> Cosan::CosanRawData< NumericType >::GetTarget ( )
inline

Get a copy of CosanMatrix<NumericType> Y.

Definition at line 147 of file CosanData.h.

147  {
148  return Y;
149  }

◆ GetTarget() [2/2]

template<Numeric NumericType>
const CosanMatrix<NumericType>& Cosan::CosanRawData< NumericType >::GetTarget ( ) const
inline

Get a const reference to const CosanMatrix<NumericType> Y.

Definition at line 159 of file CosanData.h.

159  {
160  return Y;
161  }

◆ GetType()

template<Numeric NumericType>
CosanMatrix<NumericType> Cosan::CosanRawData< NumericType >::GetType ( )
inline
Returns
an empty CosanMatrix<NumericType> data structure
Note
it is used to determined CosanMatrix<NumericType> data type.

Definition at line 298 of file CosanData.h.

298 {return __TYPE;}

◆ SetInput()

template<Numeric NumericType>
void Cosan::CosanRawData< NumericType >::SetInput ( const std::string &  srcX)
inline

Update input X from csv file.

Parameters
[in]srcXpath to the csv file of data X.

Definition at line 74 of file CosanData.h.

◆ SetTarget()

template<Numeric NumericType>
void Cosan::CosanRawData< NumericType >::SetTarget ( const std::string &  srcY)
inline

Update target Y from csv file.

Parameters
[in]srcYpath to the csv file of data Y.

Definition at line 82 of file CosanData.h.

82  {
84  if (colCatY.size()!=0){
85  catY=true;
86  }
87  }

◆ UpdateCat() [1/2]

template<Numeric NumericType>
void Cosan::CosanRawData< NumericType >::UpdateCat ( const std::vector< std::string > &  inputX)
inline

Update categorical vector svaluesX using std::vector<std::string> & inputX.

Parameters
[in]conststd::vector<std::string> & inputX

Definition at line 125 of file CosanData.h.

125  {
126  svaluesX = inputX;
127  }

◆ UpdateCat() [2/2]

template<Numeric NumericType>
void Cosan::CosanRawData< NumericType >::UpdateCat ( const std::vector< std::string > &  inputX,
const std::vector< std::string > &  inputY 
)
inline

Update categorical vector svaluesX,svaluesY using std::vector<std::string> & inputX,inputY.

Parameters
[in]conststd::vector<std::string> & inputX,inputY

Definition at line 133 of file CosanData.h.

133  {
134  svaluesX = inputX;
135  svaluesY = inputY;
136  }

◆ UpdateData() [1/2]

template<Numeric NumericType>
void Cosan::CosanRawData< NumericType >::UpdateData ( const CosanMatrix< NumericType > &  inputX)
inline

Update X using CosanMatrix<NumericType> input X.

Parameters
[in]constCosanMatrix<NumericType>& inputX

Definition at line 108 of file CosanData.h.

108  {
109  X = inputX;
110  }

◆ UpdateData() [2/2]

template<Numeric NumericType>
void Cosan::CosanRawData< NumericType >::UpdateData ( const CosanMatrix< NumericType > &  inputX,
const CosanMatrix< NumericType > &  inputY 
)
inline

Update X and Y using CosanMatrix<NumericType> inputX,inputY.

Parameters
[in]constCosanMatrix<NumericType>& inputX,const CosanMatrix<NumericType>& inputY

Definition at line 116 of file CosanData.h.

116  {
117  X = inputX;
118  Y = inputY;
119  }

Member Data Documentation

◆ __TYPE

template<Numeric NumericType>
CosanMatrix<NumericType> Cosan::CosanRawData< NumericType >::__TYPE
protected

Definition at line 308 of file CosanData.h.

◆ _raw2catIdx

template<Numeric NumericType>
std::unordered_map<gsl::index,gsl::index> Cosan::CosanRawData< NumericType >::_raw2catIdx
private

Definition at line 339 of file CosanData.h.

◆ _raw2numIdx

template<Numeric NumericType>
std::unordered_map<gsl::index,gsl::index> Cosan::CosanRawData< NumericType >::_raw2numIdx
private

Definition at line 339 of file CosanData.h.

◆ catY

template<Numeric NumericType>
bool Cosan::CosanRawData< NumericType >::catY = false
protected

true mean respone variable Y is categorical data.

Definition at line 325 of file CosanData.h.

◆ colCatX

template<Numeric NumericType>
std::set<gsl::index> Cosan::CosanRawData< NumericType >::colCatX
protected

column idx in the origin data that is categorical data.

Definition at line 321 of file CosanData.h.

◆ colCatY

template<Numeric NumericType>
std::set<gsl::index> Cosan::CosanRawData< NumericType >::colCatY
protected

Definition at line 321 of file CosanData.h.

◆ colsX

template<Numeric NumericType>
gsl::index Cosan::CosanRawData< NumericType >::colsX = 0
protected

Definition at line 329 of file CosanData.h.

◆ colsY

template<Numeric NumericType>
gsl::index Cosan::CosanRawData< NumericType >::colsY = 0
protected

Definition at line 333 of file CosanData.h.

◆ IdxminfX

template<Numeric NumericType>
std::vector<std::vector<gsl::index> > Cosan::CosanRawData< NumericType >::IdxminfX
protected

Definition at line 316 of file CosanData.h.

◆ IdxminfY

template<Numeric NumericType>
std::vector<std::vector<gsl::index> > Cosan::CosanRawData< NumericType >::IdxminfY
protected

Definition at line 317 of file CosanData.h.

◆ IdxmissingX

template<Numeric NumericType>
std::vector<std::vector<gsl::index> > Cosan::CosanRawData< NumericType >::IdxmissingX
protected

Definition at line 316 of file CosanData.h.

◆ IdxmissingY

template<Numeric NumericType>
std::vector<std::vector<gsl::index> > Cosan::CosanRawData< NumericType >::IdxmissingY
protected

Definition at line 317 of file CosanData.h.

◆ IdxpinfX

template<Numeric NumericType>
std::vector<std::vector<gsl::index> > Cosan::CosanRawData< NumericType >::IdxpinfX
protected

position for positive, negative infinity and missing values.

Definition at line 316 of file CosanData.h.

◆ IdxpinfY

template<Numeric NumericType>
std::vector<std::vector<gsl::index> > Cosan::CosanRawData< NumericType >::IdxpinfY
protected

Definition at line 317 of file CosanData.h.

◆ rowsX

template<Numeric NumericType>
gsl::index Cosan::CosanRawData< NumericType >::rowsX = 0
protected

number of rows.

Definition at line 329 of file CosanData.h.

◆ rowsY

template<Numeric NumericType>
gsl::index Cosan::CosanRawData< NumericType >::rowsY = 0
protected

number of columns

Definition at line 333 of file CosanData.h.

◆ SummaryMessageX

template<Numeric NumericType>
std::string Cosan::CosanRawData< NumericType >::SummaryMessageX
protected

Loading message.

Definition at line 312 of file CosanData.h.

◆ SummaryMessageY

template<Numeric NumericType>
std::string Cosan::CosanRawData< NumericType >::SummaryMessageY
protected

Definition at line 312 of file CosanData.h.

◆ svaluesX

template<Numeric NumericType>
std::vector<std::string> Cosan::CosanRawData< NumericType >::svaluesX
protected

Get the vector of categorical data from original data. order: row first.

Definition at line 337 of file CosanData.h.

◆ svaluesY

template<Numeric NumericType>
std::vector<std::string> Cosan::CosanRawData< NumericType >::svaluesY
protected

Definition at line 337 of file CosanData.h.

◆ X

template<Numeric NumericType>
CosanMatrix<NumericType> Cosan::CosanRawData< NumericType >::X
protected

Numeric data from origin CSV file for X.

Definition at line 303 of file CosanData.h.

◆ Y

template<Numeric NumericType>
CosanMatrix<NumericType> Cosan::CosanRawData< NumericType >::Y
protected

Numeric data from origin CSV file for Y.

Definition at line 307 of file CosanData.h.


The documentation for this class was generated from the following file:
Cosan::CosanRawData::SummaryMessageY
std::string SummaryMessageY
Definition: CosanData.h:312
Cosan::CosanRawData::colsY
gsl::index colsY
Definition: CosanData.h:333
NumericType
double NumericType
Definition: onehotencodingTest.cpp:20
Cosan::CosanRawData::IdxminfX
std::vector< std::vector< gsl::index > > IdxminfX
Definition: CosanData.h:316
Cosan::CosanRawData::__TYPE
CosanMatrix< NumericType > __TYPE
Definition: CosanData.h:308
Cosan::CosanRawData::IdxpinfX
std::vector< std::vector< gsl::index > > IdxpinfX
position for positive, negative infinity and missing values.
Definition: CosanData.h:316
Cosan::CosanRawData::svaluesX
std::vector< std::string > svaluesX
Definition: CosanData.h:337
Cosan::CosanRawData::IdxmissingX
std::vector< std::vector< gsl::index > > IdxmissingX
Definition: CosanData.h:316
Cosan::CosanRawData::SetTarget
void SetTarget(const std::string &srcY)
Update target Y from csv file.
Definition: CosanData.h:82
Cosan::CosanRawData::catY
bool catY
true mean respone variable Y is categorical data.
Definition: CosanData.h:325
Cosan::CosanRawData::X
CosanMatrix< NumericType > X
Numeric data from origin CSV file for X.
Definition: CosanData.h:303
Cosan::CosanRawData::_load_csv
std::tuple< gsl::index, gsl::index, std::string > _load_csv(const std::string &path, CosanMatrix< NumericType > &X, std::vector< std::vector< gsl::index >> &Idxpinf, std::vector< std::vector< gsl::index >> &Idxminf, std::vector< std::vector< gsl::index >> &Idxmissing, std::vector< std::string > &svalues, std::set< gsl::index > &colCat)
load data from csv
Definition: CosanData.h:358
Cosan::CosanRawData::_raw2numIdx
std::unordered_map< gsl::index, gsl::index > _raw2numIdx
Definition: CosanData.h:339
Cosan::CosanBO::CosanBO
CosanBO()
Default constructor.
Definition: CosanBO.h:67
Cosan::CosanRawData::colCatY
std::set< gsl::index > colCatY
Definition: CosanData.h:321
Cosan::CosanRawData::rowsY
gsl::index rowsY
number of columns
Definition: CosanData.h:333
Cosan::CosanRawData::CosanRawData
CosanRawData()=default
Cosan::CosanRawData::svaluesY
std::vector< std::string > svaluesY
Definition: CosanData.h:337
Cosan::CosanRawData::IdxmissingY
std::vector< std::vector< gsl::index > > IdxmissingY
Definition: CosanData.h:317
Cosan::CosanRawData::rowsX
gsl::index rowsX
number of rows.
Definition: CosanData.h:329
Cosan::CosanRawData::_raw2catIdx
std::unordered_map< gsl::index, gsl::index > _raw2catIdx
Definition: CosanData.h:339
Cosan::CosanRawData::GetrowsX
gsl::index GetrowsX()
Get the number of rows for X.
Definition: CosanData.h:254
Cosan::CosanRawData::IdxminfY
std::vector< std::vector< gsl::index > > IdxminfY
Definition: CosanData.h:317
Cosan::CosanRawData::IdxpinfY
std::vector< std::vector< gsl::index > > IdxpinfY
Definition: CosanData.h:317
Cosan::CosanRawData::Y
CosanMatrix< NumericType > Y
Numeric data from origin CSV file for Y.
Definition: CosanData.h:307
Cosan::CosanRawData::SetInput
void SetInput(const std::string &srcX)
Update input X from csv file.
Definition: CosanData.h:74
Cosan::CosanRawData::SummaryMessageX
std::string SummaryMessageX
Loading message.
Definition: CosanData.h:312
Cosan::CosanRawData::colCatX
std::set< gsl::index > colCatX
column idx in the origin data that is categorical data.
Definition: CosanData.h:321
Cosan::CosanRawData::colsX
gsl::index colsX
Definition: CosanData.h:329