I'm an MSc Bioinformatics student at Queen Mary University of London. My thesis involves building a full multi-omics pipeline for rare disease genomics, covering everything from raw sequencing data through to differential expression, methylation analysis, and pathway enrichment.
I'm most interested in the engineering side of bioinformatics — building pipelines that are reproducible, well-documented, and actually run on real HPC infrastructure.
Languages: Python, R, bash
Pipeline tools: Snakemake, SLURM, conda, Singularity
Sequencing: bulk RNA-seq, bisulfite sequencing, paired-end Illumina
R / Bioconductor: DESeq2, DMRcaller, clusterProfiler, GenomicRanges, BSseq
Python: pandas, scikit-learn, matplotlib, scanpy
Other: Git, GitHub Actions CI, HPC (Apocrita/QMUL)
Currently building a multi-omics pipeline on an HPC cluster — STAR and Bismark alignment on hg38, differential expression with DESeq2, differential methylation with DMRcaller, and GO/KEGG enrichment with clusterProfiler. Everything managed with Snakemake and validated with GitHub Actions CI.
Outside of the thesis I've built smaller projects around single-cell label transfer, variant pathogenicity classification, and phenotype-disease matching.
Rare disease genomics, epigenomics, pipeline infrastructure, and translational applications in pharma. Long term I want to work on the engineering and infrastructure side — building tools that make large-scale genomic analysis reproducible and scalable.