ISSN: 0974-276X
Taoufik Bensellak, Ahmed Moussa
The primary objective of the target-decoy strategy is to estimate the False Discovery Rate (FDR) for reported peptide or protein matches during a protein database search. Various strategies, such as decoy database generation methods and search engine combinations, have been explored. Earlier research investigated the influence of decoy construction models and showed stochastic/statistical methods to be more promising and accurate than basic sequence reversal or shuffling methods.
In this paper, we propose a novel decoy creation framework based on proteins’ significant biological signatures, patterns, and profiles, using stochastic models such as Markov. As part of the proposed approach’s flexibility, decoy sequence generation can be adapted to digestion sites and be peptide-based or protein-based.
For comparison and benchmark purposes, we investigated a standard MS/MS data set of two well-known protein pools based on E. coli peptide fragments to compare the proposed approach to standard methods by assessing the false discovery rate and identification correctness.
When compared to default methods, the false discovery rate was quite high. The imbalanced number of discovered patterns in the two pools has resulted in an improved accuracy and specificity for sequences with the most signatures. For certain examined samples, the proposed method improved the correct and incorrect identification ratios by 12.3 percent and 7.7 percent, respectively.