Skip to content
View munaberhe's full-sized avatar

Block or report munaberhe

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
munaberhe/README.md

Hi, I'm Muna

I'm an MSc Bioinformatics student at Queen Mary University of London. My thesis involves building a full multi-omics pipeline for rare disease genomics, covering everything from raw sequencing data through to differential expression, methylation analysis, and pathway enrichment.

I'm most interested in the engineering side of bioinformatics — building pipelines that are reproducible, well-documented, and actually run on real HPC infrastructure.

What I work with

Languages: Python, R, bash
Pipeline tools: Snakemake, SLURM, conda, Singularity
Sequencing: bulk RNA-seq, bisulfite sequencing, paired-end Illumina
R / Bioconductor: DESeq2, DMRcaller, clusterProfiler, GenomicRanges, BSseq
Python: pandas, scikit-learn, matplotlib, scanpy
Other: Git, GitHub Actions CI, HPC (Apocrita/QMUL)

What I'm working on

Currently building a multi-omics pipeline on an HPC cluster — STAR and Bismark alignment on hg38, differential expression with DESeq2, differential methylation with DMRcaller, and GO/KEGG enrichment with clusterProfiler. Everything managed with Snakemake and validated with GitHub Actions CI.

Outside of the thesis I've built smaller projects around single-cell label transfer, variant pathogenicity classification, and phenotype-disease matching.

Interests

Rare disease genomics, epigenomics, pipeline infrastructure, and translational applications in pharma. Long term I want to work on the engineering and infrastructure side — building tools that make large-scale genomic analysis reproducible and scalable.

Pinned Loading

  1. scRNA_label_transfer_benchmark scRNA_label_transfer_benchmark Public

    Benchmarking kNN, RandomForest and Scanpy ingest for cell type label transfer on PBMC single-cell RNA-seq data.

    Python

  2. phenotype_disease_matching phenotype_disease_matching Public

    Toy benchmark of BM25, TF-IDF and an LLM baseline for phenotype–disease matching, supporting an MSc project on LLMs for genomic diagnosis.

    Python

  3. rnaseq_deseq2_pathway rnaseq_deseq2_pathway Public

    DESeq2 and clusterProfiler pipeline for differential expression and GO enrichment on the airway RNA-seq dataset.

    R

  4. variant_pathogenicity_classifier variant_pathogenicity_classifier Public

    ClinVar-style toy project that trains and evaluates a RandomForest classifier to predict variant pathogenicity from gene, consequence, impact, PolyPhen-like score, and allele frequency.

    Python

  5. exomiser_llm_benchmark exomiser_llm_benchmark Public

    Experimental benchmark for rare disease prioritisation tools, contrasting algorithmic (Exomiser-like) and LLM-based approaches.

    Python

  6. Afolab62/Directed-Evolution-Portal Afolab62/Directed-Evolution-Portal Public

    Final Project Submission - BIO727P - Bioinformatics Software Development Group Project - 2025/26 Ventura -- Muna Berhe, Olaoluwa Afolabi, Virginia La Spina, Leora Tejal Mouli, Jasmee Navaratnarajah

    Python 1