OnlinePCA.jl (Command line tool)

All functions can be performed as command line tool in shell window and same options in OnlinePCA.jl (Julia API) are available.

After installation of OnlinePCA.jl, command line tools are saved at YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/.

The functions can be performed as below.

Binarization (CSV file)

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/csv2bin \
--csvfile Data.csv \
--binfile OUTDIR/Data.zst

Binarization (Matrix Market <MM> file)

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/mm2bin \
--mmfile Data.mtx \
--binfile OUTDIR/Data.mtx.zst

Binarization (Binary COO <BinCOO> file)

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/bincoo2bin \
--mmfile Data.bincoo \
--binfile OUTDIR/Data.bincoo.zst

Summarization (CSV file)

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/sumr \
--binfile OUTDIR/Data.zst \
--outdir OUTDIR \
--pseudocount 1f0

Summarization (MM file)

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/sumr \
--binfile OUTDIR/Data.zst \
--outdir OUTDIR \
--pseudocount 1f0 \
--sparse_mode true

Filtering

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/filtering \
--binfile OUTDIR/Data.zst \
--featurelist OUTDIR/Feature_Means.csv \
--thr1 10 \
--direct1 "+" \
--outdir OUTDIR

Identifying Highly Variable Genes

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/hvg \
--binfile OUTDIR/Data.zst \
--rowmeanlist OUTDIR/Feature_Means.csv \
--rowvarlist OUTDIR/Feature_Vars.csv \
--rowcv2list OUTDIR/Feature_CV2s.csv \
--outdir OUTDIR

GD-PCA

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/gd \
--input Data.zst \
--outdir OUTDIR \
--scale ftt \
--pseudocount 1f0 \
--rowmeanlist Feature_FTTMeans.csv \
--colsumlist Sample_NoCounts.csv \
--dim 3 \
--stepsize 0.1 \
--numepoch 5 \
--scheduling "robbins-monro" \
--g 0.9 \
--epsilon 1.0e-8 \
--lower 0 \
--upper 1.0f+38 \
--expvar 0.1f0 \
--evalfreq 5000 \
--offsetFull 1f-20 \
--initW Eigen_vectors.csv \
--initV Loadings.csv \
--logdir OUTDIR \
--perm false \
--cper 1f0

SGD-PCA

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/rsgd \
--input Data.zst \
--outdir OUTDIR \
--scale ftt \
--pseudocount 1f0 \
--rowmeanlist Feature_FTTMeans.csv \
--colsumlist Sample_NoCounts.csv \
--dim 3 \
--stepsize 0.1 \
--numepoch 5 \
--scheduling "robbins-monro" \
--g 0.9 \
--epsilon 1.0e-8 \
--lower 0 \
--upper 1.0f+38 \
--expvar 0.1f0 \
--evalfreq 5000 \
--offsetStoch 1f-6 \
--initW Eigen_vectors.csv \
--initV Loadings.csv \
--logdir OUTDIR \
--perm false \
--cper 1f0

Oja's method

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/oja \
--input Data.zst \
--outdir OUTDIR \
--scale ftt \
--pseudocount 1f0 \
--rowmeanlist Feature_FTTMeans.csv \
--rowvarlist Feature_FTTVars.csv \
--colsumlist Sample_NoCounts.csv \
--dim 3 \
--stepsize 0.1 \
--numepoch 3 \
--scheduling "robbins-monro" \
--g 0.9 \
--epsilon 1.0e-8 \
--lower 0 \
--upper 1.0f+38 \
--expvar 0.1f0 \
--evalfreq 5000 \
--offsetStoch 1f-6 \
--initW Eigen_vectors.csv \
--initV Loadings.csv \
--logdir OUTDIR \
--perm false \
--cper 1f0

CCIPCA

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/ccipca \
--input Data.zst \
--outdir OUTDIR \
--scale ftt \
--pseudocount 1f0 \
--rowmeanlist Feature_FTTMeans.csv \
--rowvarlist Feature_FTTVars.csv \
--colsumlist Sample_NoCounts.csv \
--dim 3 \
--stepsize 0.1 \
--numepoch 3 \
--lower 0 \
--upper 1.0f+38 \
--expvar 0.1f0 \
--evalfreq 5000 \
--offsetStoch 1f-15 \
--initW Eigen_vectors.csv \
--initV Loadings.csv \
--logdir OUTDIR \
--perm false \
--cper 1f0

RSGD-PCA

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/rsgd \
--input Data.zst \
--outdir OUTDIR \
--scale ftt \
--pseudocount 1f0 \
--rowmeanlist Feature_FTTMeans.csv \
--rowvarlist Feature_FTTVars.csv \
--colsumlist Sample_NoCounts.csv \
--dim 3 \
--stepsize 0.1 \
--numepoch 5 \
--scheduling "robbins-monro" \
--g 0.9 \
--epsilon 1.0e-8 \
--lower 0 \
--upper 1.0f+38 \
--expvar 0.1f0 \
--evalfreq 5000 \
--offsetStoch 1f-6 \
--initW Eigen_vectors.csv \
--initV Loadings.csv \
--logdir OUTDIR \
--perm false \
--cper 1f0

SVRG-PCA

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/svrg \
--input Data.zst \
--outdir OUTDIR \
--scale ftt \
--pseudocount 1f0 \
--rowmeanlist Feature_FTTMeans.csv \
--rowvarlist Feature_FTTVars.csv \
--colsumlist Sample_NoCounts.csv \
--dim 3 \
--stepsize 0.1 \
--numepoch 5 \
--scheduling "robbins-monro" \
--g 0.9 \
--epsilon 1.0e-8 \
--lower 0 \
--upper 1.0f+38 \
--expvar 0.1f0 \
--evalfreq 5000 \
--offsetFull 1f-20 \
--offsetStoch 1f-6 \
--initW Eigen_vectors.csv \
--initV Loadings.csv \
--logdir OUTDIR \
--perm false \
--cper 1f0

RSVRG-PCA

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/rsvrg \
--input Data.zst \
--outdir OUTDIR \
--scale ftt \
--pseudocount 1f0 \
--rowmeanlist Feature_FTTMeans.csv \
--rowvarlist Feature_FTTVars.csv \
--colsumlist Sample_NoCounts.csv \
--dim 3 \
--stepsize 0.1 \
--numepoch 5 \
--scheduling "robbins-monro" \
--g 0.9 \
--epsilon 1.0e-8 \
--lower 0 \
--upper 1.0f+38 \
--expvar 0.1f0 \
--evalfreq 5000 \
--offsetFull 1f-20 \
--offsetStoch 1f-6 \
--initW Eigen_vectors.csv \
--initV Loadings.csv \
--logdir OUTDIR \
--perm false \
--cper 1f0

Orthogonal Iteration (Power method)

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/orthiter \
--input Data.zst \
--outdir OUTDIR \
--scale ftt \
--pseudocount 1f0 \
--rowmeanlist Feature_FTTMeans.csv \
--rowvarlist Feature_FTTVars.csv \
--colsumlist Sample_NoCounts.csv \
--dim 3 \
--numepoch 10 \
--lower 0 \
--upper 1.0f+38 \
--expvar 0.1f0 \
--initW Eigen_vectors.csv \
--initV Loadings.csv \
--logdir OUTDIR \
--perm false \
--cper 1f0

Arnoldi method

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/arnoldi \
--input Data.zst \
--outdir OUTDIR \
--scale ftt \
--pseudocount 1f0 \
--rowmeanlist Feature_FTTMeans.csv \
--rowvarlist Feature_FTTVars.csv \
--colsumlist Sample_NoCounts.csv \
--dim 3 \
--numepoch 10 \
--perm false \
--cper 1f0

Lanczos method

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/lanczos \
--input Data.zst \
--outdir OUTDIR \
--scale ftt \
--pseudocount 1f0 \
--rowmeanlist Feature_FTTMeans.csv \
--rowvarlist Feature_FTTVars.csv \
--colsumlist Sample_NoCounts.csv \
--dim 3 \
--numepoch 10 \
--perm false \
--cper 1f0

Halko's method

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/halko \
--input Data.zst \
--outdir OUTDIR \
--scale ftt \
--pseudocount 1f0 \
--rowmeanlist Feature_FTTMeans.csv \
--rowvarlist Feature_FTTVars.csv \
--colsumlist Sample_NoCounts.csv \
--dim 3 \
--initW Eigen_vectors.csv \
--initV Loadings.csv \
--logdir OUTDIR \
--perm false \
--cper 1f0

Algorithm 971

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/algorithm971 \
--input Data.zst \
--outdir OUTDIR \
--scale ftt \
--pseudocount 1f0 \
--rowmeanlist Feature_FTTMeans.csv \
--rowvarlist Feature_FTTVars.csv \
--colsumlist Sample_NoCounts.csv \
--dim 3 \
--initW Eigen_vectors.csv \
--initV Loadings.csv \
--logdir OUTDIR \
--perm false \
--cper 1f0

Randomized Block Krylov Iteration

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/rbkiter \
--input Data.zst \
--outdir OUTDIR \
--scale ftt \
--pseudocount 1f0 \
--rowmeanlist Feature_FTTMeans.csv \
--rowvarlist Feature_FTTVars.csv \
--colsumlist Sample_NoCounts.csv \
--dim 3 \
--numepoch 10 \
--lower 0 \
--upper 1.0f+38 \
--expvar 0.1f0 \
--initW Eigen_vectors.csv \
--initV Loadings.csv \
--logdir OUTDIR \
--perm false \
--cper 1f0

Single-pass PCA type I

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/singlepass \
--input Data.zst \
--outdir OUTDIR \
--scale ftt \
--pseudocount 1f0 \
--rowmeanlist Feature_FTTMeans.csv \
--rowvarlist Feature_FTTVars.csv \
--colsumlist Sample_NoCounts.csv \
--dim 3 \
--noversamples 5 \
--niter 3 \
--initW Eigen_vectors.csv \
--initV Loadings.csv \
--logdir OUTDIR \
--perm false \
--cper 1f0

Single-pass PCA type II

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/singlepass2 \
--input Data.zst \
--outdir OUTDIR \
--scale ftt \
--pseudocount 1f0 \
--rowmeanlist Feature_FTTMeans.csv \
--rowvarlist Feature_FTTVars.csv \
--colsumlist Sample_NoCounts.csv \
--dim 3 \
--noversamples 5 \
--niter 3 \
--initW Eigen_vectors.csv \
--initV Loadings.csv \
--logdir OUTDIR \
--perm false \
--cper 1f0

Summarization for 10X-HDF5

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/tenxsumr \
--tenxfile Data.zst \
--outdir OUTDIR \
--group mm10 \
--chunksize 5000

ALGORITHM971 for 10X-HDF5

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/tenxpca \
--tenxfile Data.h5 \
--outdir OUTDIR \
--scale sqrt \
--rowmeanlist Feature_SqrtMeans.csv \
--rowvarlist Feature_SqrtVars.csv \
--colsumlist Sample_NoCounts.csv \
--dim 3 \
--noversamples 5 \
--niter 3 \
--chunksize 5000 \
--group mm10 \
--initW Eigen_vectors.csv \
--initV Loadings.csv \
--logdir OUTDIR \
--perm false \
--cper 1f0

Sparse Randomized SVD (ALGORITHM971 for Binarized MM file)

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/sparse_rsvd \
--input Data.mtx.zst \
--outdir OUTDIR \
--scale ftt \
--rowmeanlist Feature_FTTMeans.csv \
--rowvarlist Feature_FTTVars.csv \
--colsumlist Sample_NoCounts.csv \
--dim 3 \
--noversamples 5 \
--niter 3 \
--chunksize 5000 \
--initW Eigen_vectors.csv \
--initV Loadings.csv \
--logdir OUTDIR \
--perm false \
--cper 1f0

Exact Out-of-Core PCA

shell> julia YOUR_HOME_DIR/.julia/v1.x/OnlinePCA/bin/exact_ooc_pca \
--input Data.mtx.zst \
--outdir OUTDIR \
--scale raw \
--pseudocount 1f0 \
--dim 3 \
--chunksize 5000 \
--sparse_mode true