NOTES:
This is the template that needs to be used for the Software Design Document, Sections 2-6 Need to be completed. Don’t do the other sections as I’m going to go back and do them.
As I’ve explained before, the whole project is to make the software that runs KNN for the chosen dataset but can also run using other datasets, insert different values for k, different train/test split, different distance functions.
I had to create the requirements document, this design document and then I have to write a paper about it all, which I’m going to do myself.
I just need help completing the design document and then the code issue I mentioned.
As I mentioned in the messages, if possible, the code needs to be changed. I need to add some kinda of exceptions to be thrown for the K value and the split float, for when the wrong thing is entered. It just needs to be similar to the one that’s thrown for the distance choice:
Images Not Shown
There is no set amount of pages that it needs to be. The professor actually told me that he doesn’t see the document being more than a few pages because it is such a simple code and design. I included the example folder to help you. It has a few different examples that I’ve found. You did do one of these documents for me before and that is there as well(890 Fleet). It does not need to look like that one though, because of course that was a much larger project. Nor does it have to look like the examples. This one just needs to be simple and to the point for this code and just follow the sections in this template.
I need this at the latest Saturday Morning, October 20th, I am in pacific time so I’m pretty sure I’m behind you.
Mammographic Data Analysis Using KNN
Software Design Document
Fall 2018
Date: 10/21/2018
Table of Contents
1INTRODUCTION3
1.1Purpose3
1.2Scope3
1.3Intended Audience3
1.4Reference Material3
1.5Definitions and Acronyms3
1.6Overview3
2SYSTEM OVERVIEW4
3SYSTEM ARCHITECTURE4
3.1Architectural Design4
3.2Decomposition Description4
3.3Exception Handling5
4DATA DESIGN5
4.1Data Description5
4.2Data Dictionary5
5COMPONENT DESIGN5
6HUMAN INTERFACE DESIGN6
6.1Overview of User Interface6
6.2Screen Images6
7REQUIREMENTS MATRIX6
8APPENDICES6
The Software Design Document is a documentation tool used in the design process to provide details on how the software should be built. Within the SDD are graphical and narrative documentation of the software design. This includes but isn’t limited to use case models, collaborative models, object behavior models, sequence diagrams, and any other relevant requirement information.
This software design document outlines the architecture and design determination for the software portion of the Mammographic Data Analysis Using KNN project. It will provide the software development team a better understanding of the system’s design including what and how it is expected to build. The SDD provides necessary information through a description of the details for the software and system to be built.
The KNN software will handle a single dataset at a time, consisting of the following attributes: Breast Imaging Reporting and Data System (BI-RADS), age, shape, margin, and density. The software will parse the data using a KNN algorithm with user-inputted K values and distance calculation selections. For the overall dataset, the output will be a percentage referring to the accuracy of the selected inputs. For each individual point in the array, the output will be a binomial determination of malignancy. It is XXXXXXX XXXX XXX XXXXXX has XXXX XXXXXXX the SRS, since the document also defines XXXXXXXXXXXXXX XXXXXXX of the behavior based XX XXX specified XXXXXXXXXXXX.
Intended XXXXXXXX
In contrast to XXX XXXXXXXX XXXXXXXXXXXX Specification (XXXX XX XXXXXXX for the user & XXXXXX) , the XXXXXXXX XX this document is XXXXXXX with XXXXXXXX development professionals in XXXX. The project will also be XXXX XX research XXXXXXXXXXXXX XX a XXXXXXXX for expanded XXXXXXXX.
<XXXX section is XXXXXXXX.
List any documents, if XXX, which were used XX sources&XX;
SDD – Software Design XXXXXXXXXXX – XXXXXXXX Requirements XXXXXXXXXXXXX
Overview
The XXXXXXXX is written according to XXX standards for XXXXXXXX XXXXXX Documentation explained in “XXXX Recommended XXXXXXXX for Software Design Documentation”. XXX XXXXXXXX XXXXXX XXXXXXXX XX XXXXXXX into 8 XXXXXXXX XXXX various XXXXXXXXXXX which are:
XXXXXXXXXXXXXXXXXX XXXXXXXXSystem ArchitectureXXXX DesignComponent DesignXXXXX XXXXXXXXX DesignXXXXXXXXXXXX XXXXXXAppendices
XXX XXXXXXXX XX a Python XXXXXXX line XXXXXXXXXXX. It XX built XX check the XXXXXXXX XX the KNN model XXXXX XX being generated. The system is capable XX taking XXXXXXXX XXXXXX from XXX XXXX XXXXXXXXX XXX XXXX XXXXXXXXX output based on the XXXXXXXXX XXXXXXXX.
There XXX five types of which XXX XXXXXXXXX:
XXXXXXXXX XXXXXXXX: The XXXXXXXXX distance or Euclidean XXXXXX is the "XXXXXXXX" XXXXXXXX-line XXXXXXXX XXXXXXX two XXXXXX in Euclidean XXXXX.Manhattan Distance: XXX distance XXXXXXX two XXXXXX XXXXXXXX along XXXX XX XXXXX XXXXXX.Minkowski XXXXXXXX: XXX Minkowski distance is a metric XX a result XX the XXXXXXXXX XXXXXXXXXX. When, the distance between (0,0) XXX (X,1) is, XXX the XXXXX (0,1) is at a XXXXXXXX X from XXXX XX XXXXX points.XXXXXXXXX XXXXXXXX: XXXXXXXXX XXXXXXXX, maximum metric, or L∞ metric is a metric defined on a XXXXXX XXXXX XXXXX XXX distance XXXXXXX two vectors is the XXXXXXXX XX XXXXX XXXXXXXXXXX XXXXX XXX XXXXXXXXXX XXXXXXXXX.XXXXXX Similarity: XXXXXX XXXXXXXXXX XX a measure of XXXXXXXXXX between two XXX-XXXX XXXXXXX of an inner XXXXXXX space XXXX XXXXXXXX the XXXXXX XX the angle between them.
A console menu XX generated XXXXX XXXXX the user to XXXXXX the distance XX be used for the XXXXXXXXXX XXX accordingly the accuracy, precision, recall, f-score of the model XX displayed as XXXXX output.
SYSTEM XXXXXXXXXXXX
XXXXXXXXXXXXX XXXXXX
Images Not Shown
XXX System consists XX two XXXX modules as XXXXXXXXX XXXXX:
DatasetXXX XXXXXXXXXX
XXXXXXX
XX machine learning, XXX XXXXX and construction XX XXXXXXXXXX that can learn XXXX and XXXX XXXXXXXXXXX XX data XX a XXXXXX XXXX. Such XXXXXXXXXX work XX making data-driven XXXXXXXXXXX or decisions, through building a mathematical model XXXX input data. XXX XXXXX data is XXXXXX a dataset. We XXX XXXXXX to do a similar XXXX XXXX KNN XXXXXXXXX.
Here, the data is XXXXXXX into X parts
XXXXXXXX XXXX: This is a set of examples XXXX to fit XXX parameters XX the XXXXX.XXXX XXXX: This is a XXXXXXX XXXX to provide an unbiased evaluation XX a final XXXXX fit on the training XXXXXXX.
KNN XXXXXXXXXX
In X-XX classifier, the XXXXXX is a class XXXXXXXXXX. XX object XX classified XX a XXXXXXXX vote XX its XXXXXXXXX, XXXX the XXXXXX being XXXXXXXX to the XXXXX XXXX common among XXX k XXXXXXX XXXXXXXXX (X XX a XXXXXXXX XXXXXXX, typically small).
To XXXX the nearest neighbors, here XX have 5 XXXXXXXXX Distance functions.
XXX XXXXXX of XXXX XXXXXXXXXX XXXX contain performance metrics XX the XXXXX generated. XXXX XXXXXXXX
XXXXXXXXPrecisionRecallF-XXXXX
Decomposition XXXXXXXXXXX
X XXXXXXXX overview XX XXX XXXXXX can XX XXXX here XXXX the XXXX of XXXX XXXXXXX given XXXXX.
Images Not Shown
XXX XXXXXX can be decomposed into sub-XXXXXXX XXXXX perform XXXXX own tasks XXX contribute XX the overall XXXXXXXXXXX XX the system. The decomposed modules XXX given below.
Images Not Shown
XXXXXXXX XXXXXXXXXXX: XXXX module XXXXX XXX XXXXXXX XXXXXXXX XX the test XXXXXXXX, XXXXX on the distance XXXXXXXXX and outputs XXX XXXXXXXXXXX. We XXXX 5 XXXXXXXX XXXXXXXXXX that XXX XXXX XXXX XXXXX XXX Euclidian distance,
XXXXXXXXX XXXXXXXX, Minkowski Distance, Chebyshev XXXXXXXX and XXXXXX XXXXXXXXXX. XXX flow diagram below explains how XXX XXXXXXXX XXXXXXXXXX XXXXX:
Images Not Shown
XXX choose distance decision XXXXX XXXXXX an XXXXXXXXXXX XXXXXXXX XXXXXXXXX XXXXX XX XXX XXXXX input and XXX nearest XXXXXXXX is computed based XX X value.
The XXXXX process XXXXXXXXX XX broken XXXX into XXXXXXXX XXXXXXXXX which XX explained in XXXXXXXX below:
XXXXXXXX Name
|
|
loadData
|
XXXXX XXXXXXXXXXX:
The XXXX function XX XXXX to XXXXX XXX XXXXXXX XXXX the XXXXXXX or user XXXXXXX XXX path.
with XXXX(XXXXXXXX, 'XX') as csvfile:
lines = csv.reader(XXXXXXX)
dataset = list(lines)
XXX x in XXXXX(XXX(dataset)-1):
XXX y in range(4):
dataset[x][y] = XXXXX(dataset[x][y])
if random.XXXXXX() &XX; split:
trainingSet.XXXXXX(XXXXXXX[x])
else:
XXXXXXX.XXXXXX(XXXXXXX[x])
XXXX XXXXXXX XXX XXXXXXX csv data XXX XXXXXXXX dataset XX a XXXXXX list, then XXXXXXXXX it XXXX XXX XXXXXXXX datasets XXXXX on split ratio.
XXXXXX attributes:
dataset: XXX dataset in XXXX of associative XXXXXX.
XXXXXXXX: XXX XXX file path
split: XXX split XXXXX which XX XXXX to split the data.
trainingSet: Subset of XXXXXXX, XXXXX according to split ratio
testSet: XXXXXX of dataset, split according to XXXXX XXXXX
|
euclideanDistance
|
Brief Description:
This function returns Euclidean Distance
XXX x in XXXXX(XXXXXX):
distance += XXX((float(XXXXXXXXX[x]) - XXXXX(instance2[x])), 2)
distance = XXXX.XXXX(distance)
XXX XXXXX code XXXXXXX XXXXXXXXXX the Euclidean XXXXXXXX
Method XXXXXXXXXX:
XXXXXXXX: XXXX of XXXXXXXX XX XX XXXXXXXXXX
Instance1: XXXX XXXXXXXX
Instance2: XXXXXXXX instance
|
XXXXXXXXXXXXXXXXX
|
XXXXX XXXXXXXXXXX:
XXXX XXXXXXXX XXXXXXX Manhattan XXXXXXXX
XXX x in range(XXXXXX):
XXXXXXXX += abs((XXXXX(XXXXXXXXX[x]) - XXXXX(instance2[x])))
XXX above code XXXXXXX calculates the Manhattan Distance
XXXXXX XXXXXXXXXX:
XXXXXXXX: XXXX XX XXXXXXXX to XX XXXXXXXXXX
XXXXXXXXX: XXXX XXXXXXXX
XXXXXXXXX: XXXXXXXX instance
|
XXXXXXXXXXXXXXXXX
|
XXXXX Description:
XXXX function returns XXXXXXXXX Distance
XXX x in range(XXXXXX):
distance+= XXX(abs(float(XXXXXXXXX[x])-float(instance2[x])), length)
XXX above code snippet XXXXXXXXXX XXX XXXXXXXXX Distance.
Method Attributes:
Distance: XXXX of distance to XX XXXXXXXXXX
XXXXXXXXX: test XXXXXXXX
XXXXXXXXX: training XXXXXXXX
|
XXXXXXXXXXXXXXXXX
|
Brief Description:
XXXX function XXXXXXX Chebyshev XXXXXXXX
XXX x in range(XXXXXX):
XXXX = XXX((float(instance1[x]) - float(instance2[x])))
XX distance &XX; dist:
XXXXXXXX = XXXX
XXX XXXXX code snippet calculates XXX XXXXXXXXX Distance.
XXXXXX XXXXXXXXXX:
XXXXXXXX: list XX XXXXXXXX to XX XXXXXXXXXX
Instance1: XXXX XXXXXXXX
XXXXXXXXX: XXXXXXXX XXXXXXXX
|
cosineSimilarity
|
Brief Description:
This XXXXXXXX returns XXXXXX Similarity
XXX x in range(XXXXXX):
m=XXXXX(instance1[x])
y=XXXXX(instance2[x])
total=total+(m*y)
XXXX=dis1+(m*m)
XXXX=XXXX+(y*y)
XXXX=(math.XXXX(XXXX))
XXXX=(math.sqrt(dis2))
XXXXXX=(XXXXX/(XXXX*XXXX))
The XXXXX code XXXXXXX XXXXXXXXXX the Cosine Similarity.
XXXXXX Attributes:
XXXXXX: list of XXXXXXXX XX XX calculated
XXXXXXXXX: test instance
XXXXXXXXX: training instance
|
XXXXXXXXXXXXX
|
Brief Description:
This XXXXXXXX XX XXXXXX most XXXXXXX XXXXXXXXX XXXXX on X XXXXXX and distance XXXXXX
XXXXXXXXX.sort(XXX=operator.itemgetter(X))
XXXXXXXXX = []
XXX x in XXXXX(k):
XXXXXXXXX.append(XXXXXXXXX[x][0])
Method attributes:
XXXXXX: choice of distance formula
trainingSet: training XXXXXXX
testInstance: test data instance
distance: XXXX of distances
XXXXXXXXXX: list XX distance which are k apart
X: X XXXXX
|
getResponse
|
XXXXX XXXXXXXXXXX:
XXXX XXXXXXX XXXXXXXX XX used to predict results XXXX training dataset
for x in XXXXX(len(neighbors)):
XXXXXXXX = XXXXXXXXX[x][-X]
if XXXXXXXX in XXXXXXXXXX:
XXXXXXXXXX[response] += X
XXXX:
classVotes[XXXXXXXX] = 1
sortedVotes = XXXXXX(classVotes.items(), key=XXXXXXXX.itemgetter(1), XXXXXXX=XXXX)
Code XXXXXXX XXX XXXXXXXXXX values
XXXXXX attributes:
XXXXXXXXX: A list of distances XX XXXXXXXXX
XXXXXXXXXX: X XXXX of XXXX XXXXXX XXXXXXX XX neighbors
sortedVotes: X XXXXXX order XX XXXXXXXXXX
|
XXXXXXXXXXX
|
XXXXX XXXXXXXXXXX:
XXXX utility function XXXXXXXXXX accuracy, precision, XXXXXX and F-score by XXXXXXXXXX XXXXXX.
XXXX XXXXXXX a decision matrix XXX XXXXX XX the XXXXXXXX matrix output is XXXXXXXXX
Formulas are used her XX calculate XXXXXXXXX, recall and F-score, XXXX XXXXXXX XX XXXXX XXXXX
XXXXXXXXX = XXXXXXXXXXXXX/(true_positive+false_positive)
XXXXXX = XXXXXXXXXXXXX/(true_positive+XXXXXXXXXXXXXX)
Accuracy = correct / XXXXX(len(testSet))
XXXXXX attributes:
XXXXXXX: Test dataset
XXXXXXXXXXX: XXXXXXXXX values
|
XXXX
|
Brief Description:
The place XXXXX XXX XXXXXXXXX happens step by XXXX. XXX order of XXX XXXXXXXXX is XXXXXXXXX in XXX XXXX diagram above.
Exceptions Handled:
Validating proper X values
Validating XXXXXX XXXXX XXXXX
XXXXXXXXXX proper distance XXXXXX
|
Exception Handling
There XXX XXXXX exceptions XXXX XXX handled, two XXXXXXXXXX XXX XX XXXXXX proper X XXXXXX XXX XXXXX ratios. The XXXXX XX to handle XXXXXX of XXXXXXXX XXXXXXXXXX.
We have XXXXXXXXXXXX both XXXX of exception handling by using try XXXXX block XXX XXXXXXX XXXXX it.
XXX handling X values, we XXXX XXXX XXX and catch XXXXX XXXXX validates XXX XXXXX by XXXXXXXX XXX ValueError’s.
Whenever XXX XXXXX other XXXX XXXXXXX is enter, XXX XXXX will try XX validate and throws a value error which in XXXX XXXXXXXX "XXXX XXX not a number, please XXX again."
XXXXXXXXX the value will be stored in X.
try:
X = XXX(input('XXXXX the value of X (Integer) : '))
XXXXX;
XXXXXX XXXXXXXXXX:
XXXXX("XXXX was XXX a number, XXXXXX try again.")
For handling split XXXXX XXXXX condition is used to XXXXXXXX XXX input to be in XXXXXXX X to X. XX any value XXX in XXXXXXX 0 XXX X XX XXXXXXX XXX code will XXX them XX re-XXXXX.
XXXXX split &XX; 0 or XXXXX > X:
XXXXX('invalid split please XXX again')
XXXXX = float(XXXXX('XXXXX the XXXXX XXX XXXXX ratio (Float 0-1): '))
XXXX, XXX handling choices XX XXXXXXXXXX we XXXX XXXX XXXXX XXXXX, XX XXX value XXX in XXXXXXX 1 and 5 XX entered, XXX XXXX will XXX XXXX to re-enter.
while XXXXXX &XX; 1 or XXXXXX &XX; X:
print('invalid choice please XXX XXXXX')
choice = XXX(input('XXXXX the XXXXXX XXXX : '))
XXXX DESIGN
Data XXXXXXXXXXX
XXX XXXXX XXXX source contains a XXXX XXXXXXX which XX then XXXXX into X XXXXX, one is called the training dataset XXX the other is XXXX XXXX set. We use XXX training XXXXXXX XX XXXXX XXX construct a XXX XXXXX, once XXX XXXXX is XXXXXXXXX we XXX XXX XXXX data XX predict XXX XXXXXXX and find accuracy, XXXXXXXXX, XXXXXX and f-score.
XXX XXXXXXX is expected XX XX in the form XX XXX (comma separated XXXXXX). XX XX use XXX python library in the XXXXXXX to XXXX the XXX XXXXXX line by XXXX and XXXXX it into a XXXX data XXXXXXXXX. This list is XXXX XXXXXXX XXXXX XXXX XXX XX described above.
Data XXXXXXXXXX
XXXXXXXX: XXXXXX
Dataset: XXXX (The list generated XX parsing XXX csv file)
XXXXXXXX: string (XXX path XX the XXX file)
F-score: XXXXXX
X value: integer
Precision: double
XXXXXX: double
XXXXX XXXXX: double (XXXX XX in XXXXXXX 0-X)
HUMAN INTERFACE DESIGN
The XXXX XXXXXXXXX for XXX XXXXXXXXXXX is command XXXX based.
The XXXXXX XXXX XXX expected XX XXX application XXX:
X path to XXXXXXX (optional)Split ratioX valueXXXXXX XX Distance algorithm
XXXXXX generated will consists of:
XXXXXXXXPrecisionXXXXXXX-XXXXX
XXXXXX Images
Images Not ShownImages Not ShownImages Not ShownImages Not ShownImages Not Shown
XXXXXXXXXXXX MATRIX
Provide a XXXXXXXXXXXXXX XXXX traces components and data structures XX the requirements in XXXX XXX XXXXXXXX. XXX a XXXXXXX XXXXXX to show which XXXXXX components satisfy XXXX XX the functional requirements XXXX XXX XXX. XXXXX XX XXX functional XXXXXXXXXXXX XX the XXXXXXX/XXXXX that you gave XXXX in the XXX.
Req. #
|
Description
|
Component(s) / XXXXXX(s)
|
XXX.X
|
System XXXXXXX XXX dataset XXXX XXXXX
|
loadData
|
FR1.2
|
System prompts for test split ratio input
|
main
|
FR1.3
|
System prompts for K XXXXX XXXXX
|
main
|
XXX.X
|
XXXXXX prompts for calculation method XXXXX
|
XXXX
|
FR2.1
|
XXXXXX throws XXXXXXXXX for invalid XXXXX XXXXX
|
main
|
FR2.2
|
XXXXXX throws exception XXX XXXXXXX K value
|
main
|
FR2.3
|
XXXXXX throws XXXXXXXXX for invalid XXXXXXXXXXX XXXXXX
|
XXXX
|
FR3.X
|
XXXXXXXX Euclidean XXXXXXXX
|
euclideanDistance
|
XXX.2
|
XXXXXXXX Manhattan XXXXXXXX
|
manhattanDistance
|
XXX.X
|
Includes Minkowski XXXXXXXX
|
XXXXXXXXXXXXXXXXX
|
XXX.X
|
XXXXXXXX Chebyshev Distance
|
chebyshevDistance
|
FR3.5
|
Includes XXXXXX XXXXXXXXXX
|
cosineSimilarity
|
FR4.1
|
System outputs XXXXXXXXX and actual results XXX each datapoint
|
findNeighbors, XXXXXXXXXXX, main
|
FR4.2
|
System outputs XXXXXXXX XX XXXXXXXXXX
|
XXXXXXXXXXX, XXXX
|
FR4.X
|
XXXXXX XXXXXXX precision of prediction
|
XXXXXXXXXXX, XXXX
|
FR4.X
|
System outputs XXXXXX XX XXXXXXXXXX
|
getAccuracy, XXXX
|
FR4.5
|
XXXXXX outputs f-score XX XXXXXXXXXX
|
main
|
XXXXXXXXXX
XXXX section is XXXXXXXX.
Appendices XXX XX XXXXXXXX, XXXXXX directly or by XXXXXXXXX, XX XXXXXXX XXXXXXXXXX details XXXX could XXX in the understanding of XXX XXXXXXXX XXXXXX Document.
">