Resources | Boston Children's Research

LS-GKM: A new gkm-SVM software for large-scale datasets

gkm-SVM is a popular machine-learning method that predicts cis-regulatory elements (CREs) from their DNA sequence. LS-GKM is an improvement on gkm-SVM, offering increased scalability and advanced features based on gapped k-mer kernels.

LS-GKM: Github

LS-GKM: Publication

gkmQC: gapped k-mer-SVM quality check and optimization

gkmQC is a sequence-based computational tool for assessing and refining the quality of chromatin accessibility data using gkm-SVM. It uses the overall “predictability” of the peaks/regions as a metric of the data quality. It trains a support vector classifier (SVC) using gapped-kmer kernels (Ghandi et al., 2014; Lee, 2016), and learns DNA sequence features predictive of regulatory element activities. It can also be used to optimize a peak calling threshold, which is particularly useful for rare cell types from single-cell ATAC-seq data.

gkmQC: Github

gkmQC: Publication

MTSA: MPRA Tag Sequence Analysis

MTSA is a sequence-based analysis for estimating tag sequence effects on gene expression in massively parallele reporter assay (MPRA) experiment. It trains a support vector regression (SVR) using gapped-kmer kernels (gkm-kernels) (Ghandi et al., 2014; Lee, 2016), and learns sequence features that modulate gene expressions. We also introduce the users to a basic tutorial of running MTSA: https://github.com/chlee-tabin/mtsa-tutorial.

MTSA: Github

MTSA: Publication

Dongwon Lee Lab Resources

LS-GKM: A new gkm-SVM software for large-scale datasets

gkmQC: gapped k-mer-SVM quality check and optimization

MTSA: MPRA Tag Sequence Analysis