NOTES:
This is the template that needs to be used for the Software Design Document, Sections 2-6 Need to be completed. Don’t do the other sections as I’m going to go back and do them.
As I’ve explained before, the whole project is to make the software that runs KNN for the chosen dataset but can also run using other datasets, insert different values for k, different train/test split, different distance functions.
I had to create the requirements document, this design document and then I have to write a paper about it all, which I’m going to do myself.
I just need help completing the design document and then the code issue I mentioned.
As I mentioned in the messages, if possible, the code needs to be changed. I need to add some kinda of exceptions to be thrown for the K value and the split float, for when the wrong thing is entered. It just needs to be similar to the one that’s thrown for the distance choice:
Images Not Shown
There is no set amount of pages that it needs to be. The professor actually told me that he doesn’t see the document being more than a few pages because it is such a simple code and design. I included the example folder to help you. It has a few different examples that I’ve found. You did do one of these documents for me before and that is there as well(890 Fleet). It does not need to look like that one though, because of course that was a much larger project. Nor does it have to look like the examples. This one just needs to be simple and to the point for this code and just follow the sections in this template.
I need this at the latest Saturday Morning, October 20th, I am in pacific time so I’m pretty sure I’m behind you.
Mammographic Data Analysis Using KNN
Software Design Document
Fall 2018
Date: 10/21/2018
Table of Contents
1INTRODUCTION3
1.1Purpose3
1.2Scope3
1.3Intended Audience3
1.4Reference Material3
1.5Definitions and Acronyms3
1.6Overview3
2SYSTEM OVERVIEW4
3SYSTEM ARCHITECTURE4
3.1Architectural Design4
3.2Decomposition Description4
3.3Exception Handling5
4DATA DESIGN5
4.1Data Description5
4.2Data Dictionary5
5COMPONENT DESIGN5
6HUMAN INTERFACE DESIGN6
6.1Overview of User Interface6
6.2Screen Images6
7REQUIREMENTS MATRIX6
8APPENDICES6
The Software Design Document is a documentation tool used in the design process to provide details on how the software should be built. Within the SDD are graphical and narrative documentation of the software design. This includes but isn’t limited to use case models, collaborative models, object behavior models, sequence diagrams, and any other relevant requirement information.
This software design document outlines the architecture and design determination for the software portion of the Mammographic Data Analysis Using KNN project. It will provide the software development team a better understanding of the system’s design including what and how it is expected to build. The SDD provides necessary information through a description of the details for the software and system to be built.
The KNN software will handle a single dataset at a time, consisting of the following attributes: Breast Imaging Reporting and Data System (BI-RADS), age, shape, margin, and density. The software will parse the data using a KNN algorithm with user-inputted K values and distance calculation selections. For the overall dataset, the output will be a percentage referring to the accuracy of the selected inputs. For each individual point in the array, the output will be a binomial determination of malignancy. It XX XXXXXXX that the reader has read through the XXX, since the XXXXXXXX XXXX XXXXXXX XXXXXXXXXXXXXX details XX the XXXXXXXX based XX XXX specified requirements.
XX XXXXXXXX to the XXXXXXXX Requirements XXXXXXXXXXXXX (XXXX XX written XXX XXX XXXX &XXX; client) , the majority of XXXX XXXXXXXX XX XXXXXXX XXXX software XXXXXXXXXXX professionals in XXXX. The project XXXX XXXX XX used by XXXXXXXX organizations XX a XXXXXXXX for expanded research.
<XXXX section XX XXXXXXXX.
XXXX XXX documents, if any, which XXXX used XX XXXXXXX&XX;
Definitions and Acronyms
SDD – Software XXXXXX XXXXXXXXSRS – XXXXXXXX XXXXXXXXXXXX Specification
XXX document is written according to the XXXXXXXXX XXX Software Design XXXXXXXXXXXXX XXXXXXXXX in “IEEE XXXXXXXXXXX XXXXXXXX XXX XXXXXXXX Design XXXXXXXXXXXXX”. The XXXXXXXX XXXXXX XXXXXXXX XX XXXXXXX XXXX X XXXXXXXX XXXX XXXXXXX subsections XXXXX are:
IntroductionXXXXXX OverviewXXXXXX ArchitectureData XXXXXXXXXXXXXXX DesignHuman XXXXXXXXX XXXXXXRequirements MatrixXXXXXXXXXX
XXX XXXXXXXX is a Python command XXXX application. XX XX built to check the XXXXXXXX XX XXX XXX model which is XXXXX generated. XXX XXXXXX is XXXXXXX XX XXXXXX datasets inputs XXXX XXX path requested XXX also generates XXXXXX XXXXX XX the algorithm XXXXXXXX.
There are XXXX XXXXX of XXXXX are supported:
XXXXXXXXX XXXXXXXX: XXX XXXXXXXXX XXXXXXXX or Euclidean XXXXXX is XXX "XXXXXXXX" XXXXXXXX-line XXXXXXXX between two XXXXXX in XXXXXXXXX space.Manhattan Distance: XXX distance between XXX XXXXXX measured XXXXX axes XX XXXXX angles.XXXXXXXXX Distance: The XXXXXXXXX distance is a metric as a XXXXXX of XXX XXXXXXXXX inequality. When, XXX distance between (0,X) XXX (X,1) is, but the XXXXX (X,X) is XX a distance X XXXX XXXX of XXXXX points.Chebyshev Distance: XXXXXXXXX distance, maximum XXXXXX, or X∞ metric XX a XXXXXX defined XX a vector space XXXXX the distance XXXXXXX XXX XXXXXXX is XXX greatest XX XXXXX XXXXXXXXXXX XXXXX XXX XXXXXXXXXX dimension.XXXXXX Similarity: Cosine XXXXXXXXXX XX a measure of similarity between two XXX-XXXX XXXXXXX XX an XXXXX product XXXXX that measures XXX cosine of XXX XXXXX between them.
A XXXXXXX XXXX is generated which gives the XXXX to XXXXXX XXX distance to XX used for XXX prediction and XXXXXXXXXXX XXX XXXXXXXX, XXXXXXXXX, XXXXXX, f-XXXXX of XXX XXXXX XX displayed XX final XXXXXX.
SYSTEM XXXXXXXXXXXX
XXXXXXXXXXXXX Design
Images Not Shown
The System XXXXXXXX of two XXXX XXXXXXX XX XXXXXXXXX XXXXX:
XXXXXXXKNN XXXXXXXXXX
Dataset
XX machine learning, the XXXXX XXX construction XX XXXXXXXXXX XXXX XXX XXXXX XXXX and make predictions XX data XX a common XXXX. Such algorithms XXXX by making data-XXXXXX predictions or decisions, XXXXXXX building a mathematical XXXXX from input data. The XXXXX data is called a XXXXXXX. XX are XXXXXX to do a XXXXXXX task XXXX KNN XXXXXXXXX.
XXXX, XXX XXXX is XXXXXXX XXXX X XXXXX
Training data: This is a set of examples used to fit XXX XXXXXXXXXX of XXX XXXXX.XXXX data: XXXX is a dataset used to XXXXXXX an XXXXXXXX XXXXXXXXXX of a XXXXX XXXXX XXX XX the training dataset.
KNN XXXXXXXXXX
XX X-NN classifier, the output is a class membership. An XXXXXX XX classified by a XXXXXXXX XXXX of XXX neighbors, with the object being XXXXXXXX to the class XXXX common XXXXX XXX k nearest XXXXXXXXX (X XX a XXXXXXXX integer, typically XXXXX).
XX XXXX the nearest XXXXXXXXX, XXXX we XXXX 5 different XXXXXXXX XXXXXXXXX.
XXX XXXXXX of this classifier will XXXXXXX XXXXXXXXXXX metrics of the XXXXX generated. This XXXXXXXX
AccuracyPrecisionRecallF-score
XXXXXXXXXXXXX Description
X complete XXXXXXXX of XXX XXXXXX XXX XX seen XXXX with XXX help of XXXX diagram XXXXX below.
Images Not Shown
XXX XXXXXX XXX XX XXXXXXXXXX XXXX XXX-modules which perform XXXXX own tasks XXX XXXXXXXXXX XX XXX XXXXXXX XXXXXXXXXXX of the XXXXXX. XXX decomposed modules are XXXXX XXXXX.
Images Not Shown
XXXXXXXX XXXXXXXXXXX: This XXXXXX XXXXX XXX XXXXXXX XXXXXXXX of XXX test XXXXXXXX, XXXXX on XXX XXXXXXXX XXXXXXXXX XXX outputs XXX predictions. We XXXX 5 distance algorithms that XXX used here which are XXXXXXXXX distance,
XXXXXXXXX Distance, XXXXXXXXX XXXXXXXX, XXXXXXXXX Distance XXX Cosine Similarity. XXX flow diagram XXXXX XXXXXXXX how the XXXXXXXX Prediction works:
Images Not Shown
XXX choose XXXXXXXX decision XXXXX select an appropriate distance XXXXXXXXX based on XXX XXXXX XXXXX XXX XXX nearest XXXXXXXX XX XXXXXXXX XXXXX XX K value.
The whole XXXXXXX described is XXXXXX down XXXX multiple functions which is XXXXXXXXX in XXXXXXXX below:
Function Name
|
|
XXXXXXXX
|
Brief XXXXXXXXXXX:
The load function is used XX parse the XXXXXXX from the XXXXXXX or XXXX defined csv path.
XXXX open(filename, 'rt') as XXXXXXX:
XXXXX = csv.XXXXXX(XXXXXXX)
dataset = list(lines)
for x in XXXXX(XXX(dataset)-1):
XXX y in XXXXX(4):
XXXXXXX[x][y] = float(dataset[x][y])
if random.XXXXXX() < XXXXX:
XXXXXXXXXXX.XXXXXX(dataset[x])
XXXX:
XXXXXXX.XXXXXX(dataset[x])
Code XXXXXXX for XXXXXXX XXX XXXX XXX XXXXXXXX dataset XX a python XXXX, XXXX splitting it into XXX separate datasets based on split XXXXX.
XXXXXX XXXXXXXXXX:
XXXXXXX: Csv dataset in XXXX of associative XXXXXX.
XXXXXXXX: XXX csv file XXXX
XXXXX: XXX XXXXX ratio XXXXX XX used XX XXXXX the data.
trainingSet: XXXXXX of dataset, XXXXX XXXXXXXXX to split XXXXX
XXXXXXX: XXXXXX XX dataset, split XXXXXXXXX to XXXXX XXXXX
|
XXXXXXXXXXXXXXXXX
|
Brief XXXXXXXXXXX:
This XXXXXXXX returns XXXXXXXXX XXXXXXXX
for x in range(length):
XXXXXXXX += pow((XXXXX(XXXXXXXXX[x]) - float(XXXXXXXXX[x])), 2)
distance = XXXX.sqrt(XXXXXXXX)
XXX above XXXX snippet calculates the XXXXXXXXX XXXXXXXX
Method Attributes:
XXXXXXXX: XXXX of XXXXXXXX to XX calculated
XXXXXXXXX: test XXXXXXXX
XXXXXXXXX: training XXXXXXXX
|
manhattanDistance
|
Brief XXXXXXXXXXX:
XXXX XXXXXXXX returns Manhattan Distance
XXX x in range(length):
distance += abs((float(XXXXXXXXX[x]) - XXXXX(XXXXXXXXX[x])))
XXX above code snippet calculates XXX Manhattan XXXXXXXX
XXXXXX Attributes:
XXXXXXXX: list of distance to XX calculated
XXXXXXXXX: XXXX instance
Instance2: training instance
|
minkowskiDistance
|
XXXXX Description:
This function XXXXXXX XXXXXXXXX Distance
XXX x in XXXXX(XXXXXX):
XXXXXXXX+= XXX(XXX(float(XXXXXXXXX[x])-XXXXX(instance2[x])), length)
The above XXXX snippet calculates the XXXXXXXXX Distance.
XXXXXX XXXXXXXXXX:
Distance: list of distance to XX calculated
XXXXXXXXX: test XXXXXXXX
XXXXXXXXX: training XXXXXXXX
|
chebyshevDistance
|
XXXXX Description:
XXXX XXXXXXXX returns XXXXXXXXX XXXXXXXX
for x in range(XXXXXX):
dist = abs((float(XXXXXXXXX[x]) - float(instance2[x])))
if distance &XX; XXXX:
distance = dist
The above XXXX XXXXXXX calculates the Chebyshev XXXXXXXX.
Method XXXXXXXXXX:
XXXXXXXX: list of distance to be calculated
Instance1: XXXX XXXXXXXX
Instance2: XXXXXXXX instance
|
cosineSimilarity
|
Brief Description:
This function XXXXXXX Cosine Similarity
XXX x in XXXXX(length):
m=XXXXX(XXXXXXXXX[x])
y=XXXXX(instance2[x])
total=total+(m*y)
dis1=dis1+(m*m)
dis2=XXXX+(y*y)
XXXX=(math.XXXX(XXXX))
dis2=(XXXX.sqrt(XXXX))
XXXXXX=(XXXXX/(dis1*XXXX))
XXX above code XXXXXXX XXXXXXXXXX the XXXXXX XXXXXXXXXX.
XXXXXX Attributes:
Result: XXXX of distance XX be calculated
Instance1: XXXX instance
XXXXXXXXX: XXXXXXXX instance
|
XXXXXXXXXXXXX
|
Brief Description:
This XXXXXXXX XX return most similar XXXXXXXXX XXXXX on K values XXX distance XXXXXX
distances.sort(key=XXXXXXXX.XXXXXXXXXX(1))
neighbors = []
XXX x in XXXXX(X):
XXXXXXXXX.XXXXXX(XXXXXXXXX[x][0])
XXXXXX attributes:
XXXXXX: XXXXXX XX distance XXXXXXX
XXXXXXXXXXX: XXXXXXXX XXXXXXX
testInstance: XXXX XXXX XXXXXXXX
distance: XXXX of distances
XXXXXXXXXX: list of XXXXXXXX XXXXX are X XXXXX
X: K XXXXX
|
XXXXXXXXXXX
|
XXXXX XXXXXXXXXXX:
This XXXXXXX function XX XXXX to XXXXXXX results from training dataset
XXX x in range(XXX(neighbors)):
XXXXXXXX = neighbors[x][-1]
if response in XXXXXXXXXX:
XXXXXXXXXX[XXXXXXXX] += X
XXXX:
XXXXXXXXXX[XXXXXXXX] = X
XXXXXXXXXXX = sorted(classVotes.items(), XXX=XXXXXXXX.XXXXXXXXXX(X), reverse=True)
Code XXXXXXX for predicting XXXXXX
Method attributes:
neighbors: X list XX XXXXXXXXX XX neighbors
XXXXXXXXXX: A list of vote values XXXXXXX XX XXXXXXXXX
XXXXXXXXXXX: X XXXXXX order of classVotes
|
getAccuracy
|
Brief Description:
This utility XXXXXXXX calculates accuracy, precision, recall XXX X-score by XXXXXXXXXX XXXXXX.
XXXX creates a XXXXXXXX XXXXXX and based XX the decision matrix output is generated
Formulas are XXXX her XX calculate XXXXXXXXX, XXXXXX XXX X-score, XXXX snipped XX XXXXX below
Precision = XXXXXXXXXXXXX/(true_positive+false_positive)
XXXXXX = true_positive/(XXXXXXXXXXXXX+false_negative)
XXXXXXXX = XXXXXXX / float(len(XXXXXXX))
Method XXXXXXXXXX:
XXXXXXX: XXXX dataset
predictions: Predicted XXXXXX
|
main
|
XXXXX Description:
The XXXXX where XXX execution XXXXXXX XXXX XX XXXX. The order XX the execution is XXXXXXXXX in the XXXX diagram XXXXX.
Exceptions Handled:
Validating proper X XXXXXX
Validating XXXXXX Split ratio
Validating proper distance XXXXXX
|
XXXXXXXXX Handling
There are XXXXX XXXXXXXXXX that are XXXXXXX, two XXXXXXXXXX XXX XX XXXXXX proper X XXXXXX and Split XXXXXX. XXX other XX to XXXXXX XXXXXX of distance XXXXXXXXXX.
We XXXX XXXXXXXXXXXX both ways of exception XXXXXXXX by XXXXX XXX catch XXXXX and XXXXXXX using it.
For handling X XXXXXX, we XXXX used try XXX catch XXXXX XXXXX XXXXXXXXX the input XX XXXXXXXX all XXXXXXXXXX’s.
Whenever XXX value other XXXX integer XX enter, the XXXX will try to XXXXXXXX and XXXXXX a value error which in turn XXXXXXXX "XXXX was XXX a number, please try XXXXX."
Otherwise the XXXXX XXXX XX XXXXXX in k.
XXX:
X = XXX(XXXXX('XXXXX XXX XXXXX of k (Integer) : '))
XXXXX;
XXXXXX ValueError:
XXXXX("This XXX not a number, XXXXXX try again.")
For XXXXXXXX split XXXXX while XXXXXXXXX is used to XXXXXXXX XXX input to be in XXXXXXX 0 XX X. XX XXX XXXXX XXX in between 0 XXX 1 XX XXXXXXX the XXXX will XXX XXXX to re-XXXXX.
while split < 0 or XXXXX > X:
print('XXXXXXX split please XXX XXXXX')
split = XXXXX(XXXXX('XXXXX XXX XXXXX for XXXXX XXXXX (XXXXX X-X): '))
Also, for XXXXXXXX choices of XXXXXXXXXX XX have XXXX while block, XX any XXXXX XXX in between X XXX 5 is entered, XXX XXXX XXXX ask XXXX XX XX-XXXXX.
while XXXXXX < 1 or choice &XX; 5:
XXXXX('XXXXXXX XXXXXX XXXXXX XXX again')
choice = XXX(XXXXX('XXXXX XXX choice here : '))
DATA XXXXXX
The input XXXX XXXXXe XXXXXXXX a huge dataset XXXXX XX XXXX split into X parts, XXX XX XXXXXX the training dataset XXX XXX other is XXXX XXXX XXX. XX XXX XXX training XXXXXXX to XXXXX XXX construct a XXX XXXXX, once the XXXXX XX XXXXXXXXX XX XXX the test data to predict XXX results and find XXXXXXXX, XXXXXXXXX, recall XXX f-XXXXX.
The dataset is XXXXXXXX to be in the XXXX of XXX (XXXXX XXXXXXXXX XXXXXX). XX XX use XXX XXXXXX library in XXX XXXXXXX to read the XXX XXXXXX line by XXXX and XXXXX it into a list data XXXXXXXXX. XXXX list XX then XXXXXXX split into XXX XX XXXXXXXXX above.
Accuracy: double
Dataset: list (XXX list generated by XXXXXXX XXX XXX file)
Filename: string (XXX path XX XXX XXX XXXX)
F-XXXXX: double
X XXXXX: integer
Precision: XXXXXX
Recall: double
Split ratio: XXXXXX (must be in between X-1)
HUMAN XXXXXXXXX DESIGN
XXXXXXXX XX XXXX XXXXXXXXX
The User XXXXXXXXX XXX XXX XXXXXXXXXXX XX command line XXXXX.
The inputs that are expected to the application XXX:
A path to dataset (optional)Split XXXXXX XXXXXXXXXXX XX XXXXXXXX algorithm
XXXXXX generated will consists of:
AccuracyPrecisionXXXXXXF-score
Images Not ShownImages Not ShownImages Not ShownImages Not ShownImages Not Shown
Provide a XXXXXXXXXXXXXX XXXX XXXXXX XXXXXXXXXX XXX XXXX structures XX XXX XXXXXXXXXXXX in your SRS document. Use a tabular XXXXXX XX show which system components satisfy XXXX XX the functional requirements from XXX SRS. XXXXX XX the XXXXXXXXXX requirements XX the numbers/XXXXX XXXX you XXXX them in XXX XXX.
Req. #
|
Description
|
Component(s) / XXXXXX(s)
|
FR1.X
|
XXXXXX prompts XXX XXXXXXX path input
|
XXXXXXXX
|
XXX.2
|
System prompts XXX XXXX XXXXX XXXXX input
|
XXXX
|
XXX.3
|
XXXXXX prompts XXX X XXXXX XXXXX
|
XXXX
|
FR1.X
|
XXXXXX prompts XXX calculation method input
|
XXXX
|
XXX.X
|
XXXXXX XXXXXX exception XXX invalid XXXXX ratio
|
XXXX
|
XXX.2
|
System XXXXXX XXXXXXXXX XXX XXXXXXX X XXXXX
|
XXXX
|
XXX.X
|
XXXXXX throws exception XXX XXXXXXX XXXXXXXXXXX method
|
XXXX
|
FR3.1
|
XXXXXXXX Euclidean XXXXXXXX
|
XXXXXXXXXXXXXXXXX
|
FR3.X
|
XXXXXXXX Manhattan Distance
|
manhattanDistance
|
FR3.3
|
XXXXXXXX Minkowski Distance
|
minkowskiDistance
|
FR3.4
|
Includes Chebyshev Distance
|
chebyshevDistance
|
XXX.5
|
Includes Cosine Similarity
|
cosineSimilarity
|
XXX.1
|
System XXXXXXX predicted XXX actual results XXX XXXX datapoint
|
findNeighbors, XXXXXXXXXXX, main
|
XXX.2
|
System outputs accuracy of XXXXXXXXXX
|
getAccuracy, XXXX
|
XXX.3
|
System outputs XXXXXXXXX XX XXXXXXXXXX
|
getAccuracy, XXXX
|
XXX.4
|
System outputs recall of XXXXXXXXXX
|
XXXXXXXXXXX, main
|
FR4.X
|
System outputs f-score XX XXXXXXXXXX
|
main
|
XXXXXXXXXX
XXXX XXXXXXX XX optional.
Appendices may be XXXXXXXX, XXXXXX directly or XX reference, to provide supporting details that could XXX in the XXXXXXXXXXXXX XX XXX Software Design Document.
">