Human metagenomic data, two-stage or one-stage approach?


New member
I'm very interested to know your views when looking at human metagenomic data, on whether a two-stage approach (human alignment with unaligned reads going for metagenomic analysis) might be preferred to a one-stage approach.

My presumption is that a two stage approach will be more computationally efficient, with a risk of subsuming some microbial reads in the human alignment (depending upon sensitivity). However, I assume that any microbial reads which are homologous enough to align to the human genome are unlikely to be very informative.

I will run a comparison myself, but that will take some time. Interested to know in the meantime what your views and experience may be.

Best wishes,



New member
I highly recommend two-stage, filtering host reads prior to metagenome analysis. We have used Huttenhower's kneadData for this which can align reads for you and split into 'contaminated' and 'clean' bins, but you may want to use an alternative two-stage approach (e.g. alternate aligner like bwa mem). We also tend to follow this with additional contaminant marking in MEGAN, though we don't normally see much by this stage.