Skip to contents

Data imputation for features with missing values

Usage

data.imputation(Data, fun = "knn")

Arguments

Data

A matrix representing the genomic data such as gene expression data, miRNA expression data.
For the matrix, the rows represent the genomic features, and the columns represent the samples.

fun

A character value representing the imputation type. The optional values are shown below:

  • "median". The NAs will be replaced by the median of the existing values of this feature in all samples.

  • "mean". The NAs will be replaced by the mean of the existing values of this feature in all samples.

  • "knn". It will apply the "impute" package to impute the missing values. This is a common way to process the missing observation for MicroArray dataset(Default value).

Value

The data matrix after imputation (without NAs).

References

Xu T, Le TD, Liu L, Su N, Wang R, Sun B, Colaprico A, Bontempi G, Li J. CancerSubtypes: an R/Bioconductor package for molecular cancer subtype identification, validation and visualization. Bioinformatics. 2017 Oct 1;33(19):3131-3133. doi: 10.1093/bioinformatics/btx378. PMID: 28605519.

Examples

Data=matrix(runif(1000),nrow = 50,ncol = 20)
geneName=paste("Gene", 1:50, sep = " ")
sampleName=paste("Sample", 1:20, sep = " ")
rownames(Data)=geneName
colnames(Data)=sampleName
index=sample(c(1:1000),60)
Data[index]=NA
result=data.imputation(Data,fun="knn")