Advice Needed: Evaluating Thesis Topics in RAG

I’m in the process of choosing a topic for my thesis and have shared two drafts. I would be grateful if anyone could review them and point out which might not be promising or suggest additional aspects

Thesis_topic.pdf (379.4 KB)

Copilot responds:

:star: High‑Level Verdict

Both topics are strong, but Topic 1 (Enhanced RAG + Semantic Cache) is much more feasible for a master’s thesis and has a clearer evaluation path.

Topic 2 (GraphRAG for Cybersecurity) is more ambitious and novel, but significantly heavier in data engineering, graph construction, and evaluation complexity.

If you want:

  • A strong, clean, finishable thesis → Topic 1

  • A research‑heavy, riskier, more novel thesis → Topic 2

:blue_book: Topic 1: Enhanced RAG with Semantic Search & Caching

:check_mark: Strengths

  • Very clear problem statement

  • Uses well‑known datasets (StackOverflow)

  • Straightforward pipeline: embeddings → FAISS → RAG → semantic cache

  • Easy to measure improvements (latency, cost, retrieval quality)

  • Strong engineering + research balance

  • Highly reproducible and publishable

:check_mark: Why it’s promising

This topic is practical, scoped, and finishable. It aligns well with current industry interest in:

  • RAG optimization

  • Latency reduction

  • Semantic caching

  • LLM‑as‑Judge evaluation

:warning: Potential Weak Spots

  • Needs a clear novelty statement (e.g., “We evaluate how semantic caching reduces latency in technical QA RAG systems.”)

  • Must avoid becoming “just an implementation project” — the evaluation section helps prevent that.

:wrench: Suggestions to strengthen it

  • Add ablation studies:

    • No cache vs. cache

    • Different similarity thresholds

    • Different embedding models

  • Add cost analysis (tokens saved)

  • Add cache eviction strategies (LRU, semantic clustering)

:blue_book: Topic 2: Graph‑Based RAG for Cybersecurity

:check_mark: Strengths

  • More novel and research‑oriented

  • Multi‑hop reasoning is a hot topic

  • Cybersecurity domain adds real‑world relevance

  • Knowledge graph + LLM pipeline is academically strong

:check_mark: Why it’s interesting

This topic has higher research value and could lead to a publication if executed well.

:warning: Major Risks

Building a high‑quality cybersecurity knowledge graph is non‑trivial. It requires:

  • Entity extraction

  • Relation extraction

  • Graph schema design

  • Cypher query generation

  • Multi‑hop evaluation

Evaluation is also harder and less standardized. Overall, it’s significantly more time‑consuming.

:wrench: Suggestions to strengthen it

  • Narrow the domain (e.g., OWASP Top 10)

  • Use an existing MITRE ATT&CK graph instead of building everything manually

  • Reduce multi‑hop depth to 1–2 hops

  • Focus on evaluation of graph‑grounded reasoning

:1st_place_medal: Which Topic Is More Promising?

Topic 1 is the safer, cleaner, more achievable thesis.
Topic 2 is more ambitious but riskier.

If you’re early in your research journey, Topic 1 is the better recommendation.

2 Likes

Thank you for sharing! I also think Topic 2 is quite risky…